Fully Dynamic Euclidean k-Means
arXiv:2507.11256v4 Announce Type: replace Abstract: We consider the Euclidean $k$-means clustering problem in a dynamic setting, where we have to explicitly maintain a solution (a set of $k$ centers) $S \subseteq \mathbb{R}^d$ subject to point insertions/deletions in $\mathbb{R}^d$. We present a dynamic algorithm for Euclidean $k$-means with $\mathrm{poly}(1/\epsilon)$-approximation ratio, $\tilde{O}(k^{\epsilon})$ update time, and $\tilde{O}(1)$ recourse, for any $\epsilon \in (0,1)$, even when $d$ and $k$ are both part of the input. This is the first algorithm to achieve a constant ratio with $o(k)$ update time for this problem, whereas the previous $O(1)$-approximation runs in $\tilde O(k)$ update time [Bhattacharya, Costa, Farokhnejad; STOC'25]. In fact, previous algorithms cannot go b
View PDF HTML (experimental)
Abstract:We consider the Euclidean $k$-means clustering problem in a dynamic setting, where we have to explicitly maintain a solution (a set of $k$ centers) $S \subseteq \mathbb{R}^d$ subject to point insertions/deletions in $\mathbb{R}^d$. We present a dynamic algorithm for Euclidean $k$-means with $\mathrm{poly}(1/\epsilon)$-approximation ratio, $\tilde{O}(k^{\epsilon})$ update time, and $\tilde{O}(1)$ recourse, for any $\epsilon \in (0,1)$, even when $d$ and $k$ are both part of the input. This is the first algorithm to achieve a constant ratio with $o(k)$ update time for this problem, whereas the previous $O(1)$-approximation runs in $\tilde O(k)$ update time [Bhattacharya, Costa, Farokhnejad; STOC'25]. In fact, previous algorithms cannot go beyond $O(k)$ update time precisely because they are designed for general metrics where an $\Omega(k)$ lower bound is known. We break this $O(k)$ barrier by devising new fundamental data structures to utilize Euclidean properties: a structure that (implicitly) maintains a clustering subject to both center and data point updates, and a range query structure that can evaluate a mergeable function over any metric ball range given as a query. To obtain these structures, we devise the first consistent hashing scheme [Czumaj, Jiang, Krauthgamer, Vesel{ý}, Yang; FOCS'22] that achieves $\tilde O(n^{\epsilon})$ running time per point evaluation with competitive parameters. Our final algorithm exploits the framework of [Bhattacharya, Costa, Farokhnejad; STOC'25] for general metrics. The key change is to redesign several critical subroutines so that they reduce to our new Euclidean data structures, replacing the general-metric implementations that are unlikely to run efficiently even when Euclidean properties are provided.
Subjects:
Data Structures and Algorithms (cs.DS)
Cite as: arXiv:2507.11256 [cs.DS]
(or arXiv:2507.11256v4 [cs.DS] for this version)
https://doi.org/10.48550/arXiv.2507.11256
arXiv-issued DOI via DataCite
Submission history
From: Jianing Lou [view email] [v1] Tue, 15 Jul 2025 12:30:40 UTC (90 KB) [v2] Wed, 16 Jul 2025 16:23:17 UTC (90 KB) [v3] Sat, 8 Nov 2025 07:30:22 UTC (97 KB) [v4] Thu, 2 Apr 2026 07:28:33 UTC (96 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
announceupdatevaluation
Average-Case Reductions for $k$-XOR and Tensor PCA
arXiv:2601.19016v2 Announce Type: replace-cross Abstract: We study the computational properties of two canonical planted average-case problems -- noisy planted $k$-XOR and Tensor PCA -- by formally unifying them into a family of planted problems parametrized by tensor order $k$, number of entries $m$, and noise level $\delta$. We build a wide range of poly-time average-case reductions within this family, across all regimes $m \in [1, n^k]$. In the denser $m \geq n^{k/2}$ regime, our reductions preserve proximity to the computational threshold, and, as a central application, reduce conjectured-hard $k$-XOR instances with $m \approx n^{k/2}$ to conjectured-hard instances of Tensor PCA. Additionally, we give new order-reducing maps at fixed densities (e.g., $5\to 4$ for $k$-XOR with $m \appro

PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
arXiv:2508.18649v2 Announce Type: replace Abstract: Safeguarding vision-language models (VLMs) is a critical challenge, as existing methods often suffer from over-defense, which harms utility, or rely on shallow alignment, failing to detect complex threats that require deep reasoning. To this end, we introduc PRISM (Principled Reasoning for Integrated Safety in Multimodality), a System 2-like framework that aligns VLMs through a structured four-stage reasoning process explicitly designed to handle three distinct categories of multimodal safety violations. Our framework consists of two key components: a structured reasoning pipeline that analyzes each violation category in dedicated stages, and PRISM-DPO, generated via Monte Carlo Tree Search (MCTS) to refine reasoning quality through Direc

Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI
arXiv:2507.05660v2 Announce Type: replace Abstract: Customizing Large Language Models (LLMs) on untrusted datasets poses severe risks of injecting toxic behaviors. In this work, we introduce Optimus, a novel defense framework designed to mitigate fine-tuning harms while preserving conversational utility. Unlike existing defenses that rely heavily on precise toxicity detection or restrictive filtering, Optimus addresses the critical challenge of ensuring robust mitigation even when toxicity classifiers are imperfect or biased. Optimus integrates a training-free toxicity classification scheme that repurposes the safety alignment of commodity LLMs, and employs a dual-strategy alignment process combining synthetic "healing data" with Direct Preference Optimization (DPO) to efficiently steer mo
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Releases

UniMark: Artificial Intelligence Generated Content Identification Toolkit
arXiv:2512.12324v3 Announce Type: replace Abstract: The rapid proliferation of Artificial Intelligence Generated Content has precipitated a crisis of trust and urgent regulatory demands. However, existing identification tools suffer from fragmentation and a lack of support for visible compliance marking. To address these gaps, we introduce the \textbf{UniMark}, an open-source, unified framework for multimodal content governance. Our system features a modular unified engine that abstracts complexities across text, image, audio, and video modalities. Crucially, we propose a novel dual-operation strategy, natively supporting both \emph{Hidden Watermarking} for copyright protection and \emph{Visible Marking} for regulatory compliance. Furthermore, we establish a standardized evaluation framewo

I just shipped my first major update to a Chrome extension. Here's what I changed and why.
Building in public means being honest about mistakes. Here's one I made with Prompt Helix and how I fixed it in v1.0.2. Prompt Helix is a Chrome extension that extracts webpage content and sends it directly to your chosen AI. No copy-pasting. No tab switching. Click, ask, get an answer in context. I launched it in February and have been iterating since. The mistake I made with the free tier. When I launched I gave away too much for free. OpenAI and Claude completely free with no daily caps. It felt generous and user-friendly. In reality it meant there was no reason to ever create an account or pay. Someone could install it and use it every day forever without seeing a single upgrade prompt. Classic freemium mistake. I only realised this when I looked at my Clerk dashboard and saw 60 instal




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!