Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAvast Premium Isn’t Flashy — But It Might Be the Smartest Cheap Antivirus Right NowGizmodoLiving brain cells enable machine learning computations - Tech XploreGoogle News: Machine LearningAragorn Is Officially Being Recast For The Hunt for GollumGizmodoOpenAI Swallows Tech Podcast for Marketing Glow-Up, NPR Builds An AI Brain, & Your Favorite Author Wants to Leave Writing to Humans - SubstackGoogle News: OpenAISony's Latest Acquisition Is a UK-Based Machine Learning Company - TechRaptorGoogle News: Machine LearningI had a bunch of Skills sitting in a folder. None of them were callable as APIsDev.to AIWhy Markdoc for LLM Streaming UIDev.to AII Let an AI Set Up My Java/.NET Bridge Project — Here's What HappenedDev.to AICreating Stunning Easter-Themed AI Images with NanoBanana2Dev.to AIWhat to know about OpenAI’s surprise acquisition of TBPN - Fast CompanyGoogle News: OpenAIAnthropic’s Claude Mythos Leak Is Bigger Than You Think - investorplace.comGoogle News: ClaudeBeyond the Hype: A Practical Guide to Integrating AI into Your Development WorkflowDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAvast Premium Isn’t Flashy — But It Might Be the Smartest Cheap Antivirus Right NowGizmodoLiving brain cells enable machine learning computations - Tech XploreGoogle News: Machine LearningAragorn Is Officially Being Recast For The Hunt for GollumGizmodoOpenAI Swallows Tech Podcast for Marketing Glow-Up, NPR Builds An AI Brain, & Your Favorite Author Wants to Leave Writing to Humans - SubstackGoogle News: OpenAISony's Latest Acquisition Is a UK-Based Machine Learning Company - TechRaptorGoogle News: Machine LearningI had a bunch of Skills sitting in a folder. None of them were callable as APIsDev.to AIWhy Markdoc for LLM Streaming UIDev.to AII Let an AI Set Up My Java/.NET Bridge Project — Here's What HappenedDev.to AICreating Stunning Easter-Themed AI Images with NanoBanana2Dev.to AIWhat to know about OpenAI’s surprise acquisition of TBPN - Fast CompanyGoogle News: OpenAIAnthropic’s Claude Mythos Leak Is Bigger Than You Think - investorplace.comGoogle News: ClaudeBeyond the Hype: A Practical Guide to Integrating AI into Your Development WorkflowDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Do Phone-Use Agents Respect Your Privacy?

HuggingFace PapersApril 1, 20262 min read1 views
Source Quiz

We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution. To make this question measurable, we introduce MyPhoneBench, a verifiable evaluation framework for privacy behavior in mobile agents. We operationalize privacy-respecting phone use as pe... (3 upvotes on HuggingFace)

Published on Apr 1

Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution. To make this question measurable, we introduce MyPhoneBench, a verifiable evaluation framework for privacy behavior in mobile agents. We operationalize privacy-respecting phone use as permissioned access, minimal disclosure, and user-controlled memory through a minimal privacy contract, iMy, and pair it with instrumented mock apps plus rule-based auditing that make unnecessary permission requests, deceptive re-disclosure, and unnecessary form filling observable and reproducible. Across five frontier models on 10 mobile apps and 300 tasks, we find that task success, privacy-compliant task completion, and later-session use of saved preferences are distinct capabilities, and no single model dominates all three. Evaluating success and privacy jointly reshuffles the model ordering relative to either metric alone. The most persistent failure mode across models is simple data minimization: agents still fill optional personal entries that the task does not require. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use agents. All code, mock apps, and agent trajectories are publicly available at~ https://github.com/tangzhy/MyPhoneBench.

View arXiv page View PDF GitHub 5 Add to collection

Get this paper in your agent:

hf papers read 2604.00986

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.00986 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.00986 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.00986 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Do Phone-Us…researchpaperarxivHuggingFace…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 182 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers