Google has released another set of benchmark results to determine the best AI models for Android coding, along with how much each model costs per token. Google’s Gemini 3.5 Flash is easily the most resource-intensive in Android development, and it doesn’t even make the top five.

As the hype for general chatbots is dying down, companies like Google, OpenAI, and Anthropic are shifting towards agentic models with a strength in coding. Users have begun relying on these models for “vibe coding,” which essentially offloads the bulk of software development to LLMs.

Recent models have dramatically improved their Android coding, and Google has kept tabs on which models perform best over the past few months. The “Android Bench” goes through updates as Google releases its own models, like the recent Gemini 3.5 Flash, and compares them to the competition.

The main takeaway is how Google breaks these models down. Each model gets a score out of 100, indicative of the percentage of Android coding cases it can successfully solve across 10 runs. Google lists expected performance and the date the last test was run, with some high performers sticking around since February.

In the latest edition of Android Bench, the results paint a more expensive picture. Gemini 3.5 Flash ranks 6th in the Android Bench list under models like GPT 5.5 and Gemini 3.1 Pro Preview, which was tested in February.