Google recently released an incredibly fast new model — Gemini 3.5 Flash. As someone building infrastructure for autonomous agents, I decided to put it through a rigorous crash test on a real-world data aggregation task to see how it handles massive context loads.
The Benchmark Task
The challenge was simple, but computationally heavy. I fed the model a massive JSON array containing 208 user objects and gave it the following prompt:
"Extract the users, find those who are over 30 years old and have green eyes, and calculate the exact mathematical average of their weight."
Enter fullscreen mode







