Stop Wasting Tokens on Android Automation

Most LLM-driven Android automation starts by showing the model a screen.

That sounds reasonable. A human looks at the phone, decides what to tap, and taps it. Give the model the same view.

The problem is that "the same view" is expensive.

A full screenshot is expensive. A raw Android UI XML dump is also expensive, just in a quieter way. The model reads thousands of tokens of layout machinery before it reaches the handful of labels that matter: