Google recently released an incredibly fast new model — Gemini 3.5 Flash. As someone building infrastructure for autonomous agents, I decided to put it through a rigorous crash test on a real-world data aggregation task to see how it handles massive context loads.

The Benchmark Task

The challenge was simple, but computationally heavy. I fed the model a massive JSON array containing 208 user objects and gave it the following prompt:

"Extract the users, find those who are over 30 years old and have green eyes, and calculate the exact mathematical average of their weight."

Enter fullscreen mode