Research Papers research paper arxiv ai artificial-intelligence

GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation

arXivby [Submitted on 27 Mar 2026]March 30, 20262 min read1 views

arXiv:2603.26266v1 Announce Type: new Abstract: Large vision-language models have endowed GUI agents with strong general capabilities for interface understanding and interaction. However, due to insufficient exposure to domain-specific software operation data during training, these agents exhibit significant domain bias - they lack familiarity with the specific operation workflows (planning) and UI element layouts (grounding) of particular applications, limiting their real-world task performance. In this paper, we present GUIDE (GUI Unbiasing via Instructional-Video Driven Expertise), a traini — Rui Xie, Zhi Gao, Chenrui Shi, Zirui Shang, Lu Chen, Qing Li

View PDF HTML (experimental)

Abstract:Large vision-language models have endowed GUI agents with strong general capabilities for interface understanding and interaction. However, due to insufficient exposure to domain-specific software operation data during training, these agents exhibit significant domain bias - they lack familiarity with the specific operation workflows (planning) and UI element layouts (grounding) of particular applications, limiting their real-world task performance. In this paper, we present GUIDE (GUI Unbiasing via Instructional-Video Driven Expertise), a training-free, plug-and-play framework that resolves GUI agent domain bias by autonomously acquiring domain-specific expertise from web tutorial videos through a retrieval-augmented automated annotation pipeline. GUIDE introduces two key innovations. First, a subtitle-driven Video-RAG pipeline unlocks video semantics through subtitle analysis, performing progressive three-stage retrieval - domain classification, topic extraction, and relevance matching - to identify task-relevant tutorial videos. Second, a fully automated annotation pipeline built on an inverse dynamics paradigm feeds consecutive keyframes enhanced with UI element detection into VLMs, inferring the required planning and grounding knowledge that are injected into the agent's corresponding modules to address both manifestations of domain bias. Extensive experiments on OSWorld demonstrate GUIDE's generality as a plug-and-play component for both multi-agent systems and single-model agents. It consistently yields over 5% improvements and reduces execution steps - without modifying any model parameters or architecture - validating GUIDE as an architecture-agnostic enhancement to bridge GUI agent domain bias.

Comments: 28 pages, 8 figures, 7 tables

Subjects:

Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26266 [cs.AI]

(or arXiv:2603.26266v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.26266

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Rui Xie [view email] [v1] Fri, 27 Mar 2026 10:33:08 UTC (4,985 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26266

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersLive

Probabilistic AVL Trees (p-AVL): Relaxing Deterministic Balancing

arXiv:2604.02223v1 Announce Type: new Abstract: This paper studies the empirical behaviour of the p-AVL tree, a probabilistic variant of the AVL tree in which each imbalance is repaired with probability $p$. This gives an exact continuous interpolation from $p = 0$, which recovers the BST endpoint, to $p = 1$, which recovers the standard AVL tree. Across random-order insertion experiments, we track rotations per node, total imbalance events, average depth, average height, and a global imbalance statistic $\sigma$. The main empirical result is that even small nonzero p already causes a strong structural change. The goal here is empirical rather than fully theoretical: to document the behaviour of the p-AVL family clearly and identify the main patterns.

arXiv cs.DS

1mabout 1 hour ago

Research PapersLive

A Constant-Approximation Distance Labeling Scheme under Polynomially Many Edge Failures

arXiv:2604.01829v1 Announce Type: new Abstract: A fault-tolerant distance labeling scheme assigns a label to each vertex and edge of an undirected weighted graph $G$ with $n$ vertices so that, for any edge set $F$ of size $|F| \leq f$, one can approximate the distance between $p$ and $q$ in $G \setminus F$ by reading only the labels of $F \cup \{p,q\}$. For any $k$, we present a deterministic polynomial-time scheme with $O(k^{4})$ approximation and $\tilde{O}(f^{4}n^{1/k})$ label size. This is the first scheme to achieve a constant approximation while handling any number of edge faults $f$, resolving the open problem posed by Dory and Parter [DP21]. All previous schemes provided only a linear-in-$f$ approximation [DP21, LPS25]. Our labeling scheme directly improves the state of the art in

arXiv cs.DS

1mabout 1 hour ago

Research PapersLive

Adaptive Fully Dynamic $k$-Center Clustering with (Near-)Optimal Worst-Case Guarantees

arXiv:2604.01726v1 Announce Type: new Abstract: Given a sequence of adversarial point insertions and point deletions, is it possible to simultaneously optimize the approximation ratio, update time, and recourse for a $k$-clustering problem? If so, can this be achieved with worst-case guarantees against an adaptive adversary? These questions have garnered significant attention in recent years. Prior works by Bhattacharya, Costa, Garg, Lattanzi, and Parotsidis [FOCS '24] and by Bhattacharya, Costa, and Farokhnejad [STOC '25] have taken significant steps toward this direction for the $k$-median clustering problem and its generalization, the $(k, z)$-clustering problem. In this paper, we study the $k$-center clustering problem, which is one of the most classical and well-studied $k$-clustering

arXiv cs.DS

2mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 159 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Probabilistic AVL Trees (p-AVL): Relaxing Deterministic Balancing

arXiv cs.DS

1mabout 1 hour ago

Research PapersLive

A Constant-Approximation Distance Labeling Scheme under Polynomially Many Edge Failures

arXiv cs.DS

1mabout 1 hour ago

Research PapersLive

Adaptive Fully Dynamic $k$-Center Clustering with (Near-)Optimal Worst-Case Guarantees

arXiv cs.DS

2mabout 1 hour ago

Research PapersLive

Single-Pass Streaming CSPs via Two-Tier Sampling

arXiv:2604.01575v1 Announce Type: new Abstract: We study the maximum constraint satisfaction problem, Max-CSP, in the streaming setting. Given $n$ variables, the constraints arrive sequentially in an arbitrary order, with each constraint involving only a small subset of the variables. The objective is to approximate the maximum fraction of constraints that can be satisfied by an optimal assignment in a single pass. The problem admits a trivial near-optimal solution with $O(n)$ space, so the major open problem in the literature has been the best approximation achievable when limiting the space to $o(n)$. The answer to the question above depends heavily on the CSP instance at hand. The integrality gap $\alpha$ of an LP relaxation, known as the BasicLP, plays a central role. In particular, a

arXiv cs.DS

2mabout 1 hour ago