Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV CommunityThe Spaceballs sequel will be released in April next yearEngadgetResearch across 1,372 participants and 9K+ trials details "cognitive surrender", where most subjects had minimal AI skepticism and accepted faulty AI reasoning (Kyle Orland/Ars Technica)TechmemeUnpacking Peter Thiel s big bet on solar-powered cow collarsTechCrunchTemplate Literals in JavaScriptDEV Community90 Autonomous Runs: What an AI Agent Society Actually Looks LikeDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessWhat next for the struggling rural mothers in China who helped to build AI?SCMP Tech (Asia AI)Apple reportedly signed a 3rd-party driver, by Tiny Corp, for AMD or Nvidia eGPUs for Apple Silicon Macs; it s meant for AI research, not accelerating graphics (AppleInsider)TechmemeBest Resume Builders in 2026: I Applied to 50 Jobs to Test TheseDEV CommunityTruth Technology and the Architecture of Digital TrustDEV CommunityI Switched From GitKraken to This Indie Git Client and I’m Not Going BackDEV CommunityWhy I Run 22 Docker Services at HomeDEV CommunityHow to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]DEV CommunityThe Spaceballs sequel will be released in April next yearEngadgetResearch across 1,372 participants and 9K+ trials details "cognitive surrender", where most subjects had minimal AI skepticism and accepted faulty AI reasoning (Kyle Orland/Ars Technica)TechmemeUnpacking Peter Thiel s big bet on solar-powered cow collarsTechCrunchTemplate Literals in JavaScriptDEV Community90 Autonomous Runs: What an AI Agent Society Actually Looks LikeDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Automating Android Build Repair: Bridging the Reasoning-Execution Gap in LLM Agents with Domain-Specific Tools

arXiv cs.SEby [Submitted on 9 Oct 2025 (v1), last revised 1 Apr 2026 (this version, v3)]April 3, 20262 min read1 views
Source Quiz

arXiv:2510.08640v3 Announce Type: replace Abstract: Android is the largest mobile platform, yet automatically building applications remains a practical challenge. While Large Language Models (LLMs) show promise for code repair, their use for fixing Android build errors remains underexplored. To address this gap, we first introduce AndroidBuildBench, a benchmark of 1,019 build failures curated from the commit histories of 43 open-source Android projects. Each problem is paired with a verified solution from a subsequent commit, ensuring that fixes are feasible. Second, we propose GradleFixer, an LLM agent with domain-specific tools for inspecting and manipulating the Gradle build environment. GradleFixer achieves a resolve rate of 81.4% (pass@1), significantly outperforming a state-of-the-ar

View PDF HTML (experimental)

Abstract:Android is the largest mobile platform, yet automatically building applications remains a practical challenge. While Large Language Models (LLMs) show promise for code repair, their use for fixing Android build errors remains underexplored. To address this gap, we first introduce AndroidBuildBench, a benchmark of 1,019 build failures curated from the commit histories of 43 open-source Android projects. Each problem is paired with a verified solution from a subsequent commit, ensuring that fixes are feasible. Second, we propose GradleFixer, an LLM agent with domain-specific tools for inspecting and manipulating the Gradle build environment. GradleFixer achieves a resolve rate of 81.4% (pass@1), significantly outperforming a state-of-the-art coding agent that relies on a general-purpose shell. GradleFixer's success suggests that while LLMs possess the high-level knowledge to solve these failures, they struggle to translate this knowledge into effective low-level actions using a general-purpose shell. We demonstrate the effectiveness of a strategy we term Tool Bridging, which replaces general-purpose shell commands with domain-aware abstractions. We hypothesize this approach works through two mechanisms: 1) it provides tools in an API-like format that LLMs use more reliably, and 2) it constrains the action space to relevant operations. This approach bridges the gap between the model's high-level reasoning and effective low-level execution.

Subjects:

Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Cite as: arXiv:2510.08640 [cs.SE]

(or arXiv:2510.08640v3 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2510.08640

arXiv-issued DOI via DataCite

Submission history

From: Ha Min Son [view email] [v1] Thu, 9 Oct 2025 01:33:25 UTC (83 KB) [v2] Wed, 19 Nov 2025 18:46:34 UTC (79 KB) [v3] Wed, 1 Apr 2026 19:05:59 UTC (81 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Automating …modellanguage mo…benchmarkannounceopen-sourceapplicationarXiv cs.SE

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 210 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!