GPU-RMQ: Accelerating Range Minimum Queries on Modern GPUs
arXiv:2604.01811v1 Announce Type: new Abstract: Range minimum queries are frequently used in string processing and database applications including biological sequence analysis, document retrieval, and web search. Hence, various data structures have been proposed for improving their efficiency on both CPUs and GPUs.Recent work has also shown that hardware-accelerated ray tracing on modern NVIDIA RTX graphic cards can be exploited to answer range minimum queries by expressing queries as rays, which are fired into a scene of triangles representing minima of ranges at different granularities. While these approaches are promising, they suffer from at least one of three issues: severe memory overhead, high index construction time, and low query throughput. This renders these methods practically
View PDF HTML (experimental)
Abstract:Range minimum queries are frequently used in string processing and database applications including biological sequence analysis, document retrieval, and web search. Hence, various data structures have been proposed for improving their efficiency on both CPUs and this http URL work has also shown that hardware-accelerated ray tracing on modern NVIDIA RTX graphic cards can be exploited to answer range minimum queries by expressing queries as rays, which are fired into a scene of triangles representing minima of ranges at different granularities. While these approaches are promising, they suffer from at least one of three issues: severe memory overhead, high index construction time, and low query throughput. This renders these methods practically unusable on larger arrays: For example, the state-of-art GPU-based approaches LCA and RTXRMQ exceed the memory capacity of an NVIDIA RTX 4090 GPU for input arrays of size >= 2^29. To tackle these problems, in this work, we present a new approach called GPU-RMQ which is based on a hierarchical approach. GPU-RMQ first constructs a hierarchy of range minimum summaries on top of the original array in a highly parallel fashion. For query answering, only the relevant portions of the hierarchy are then processed in an optimized massively-parallel scan operation. Additionally, GPU-RMQ is hybrid in design enabling the use of both ray tracing cores and CUDA cores across different levels of the hierarchy to handle queries. Our experimental evaluation shows that GPU-RMQ outperforms the state-of-the-art approaches in terms of query throughput especially for larger arrays while offering a significantly lower memory footprint and up to two orders-of-magnitude faster index construction. In particular, it achieves up to ~8x higher throughput than LCA, ~17x higher throughput than RTXRMQ, and up to ~4800x higher throughput compared to an optimized CPU-based approach.
Subjects:
Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Cite as: arXiv:2604.01811 [cs.DB]
(or arXiv:2604.01811v1 [cs.DB] for this version)
https://doi.org/10.48550/arXiv.2604.01811
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Felix Schuhknecht [view email] [v1] Thu, 2 Apr 2026 09:23:08 UTC (1,250 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
announceapplicationvaluation
Anthropic’s Designs Three-Agent Harness Supports Long-Running Full-Stack AI Development
Anthropic introduces a three-agent harness separating planning, generation, and evaluation to improve long-running autonomous AI workflows for frontend and full-stack development. Industry commentary highlights structured approaches, iterative evaluation, and practical methods to maintain coherence and quality over multi-hour AI coding sessions. By Leela Kumili

oh-my-claudecode is a Game Changer: Experiencing Local AI Swarm Orchestration
While the official Claude Code CLI has been making waves recently, I stumbled upon a tool that pushes its potential to the absolute limit: oh-my-claudecode (OMC) . More than just a coding assistant, OMC operates on the concept of local swarm orchestration for AI agents . It’s been featured in various articles and repos, but after spinning it up locally, I can confidently say this is a paradigm shift in the developer experience. Here is my hands-on review and why I think it’s worth adding to your stack. Why is oh-my-claudecode so powerful? If the standard Claude Code is like having a brilliant junior developer sitting next to you, OMC is like hiring an entire elite engineering team . Instead of relying on a single AI to handle everything sequentially, OMC leverages multiple specialized agen
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.






Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!