Gemini 3.5 Flash vs Claude Haiku vs GPT-4o mini: Picking a Small Model

The Gemini 3.5 Flash announcement made the rounds on Hacker News this week, and I've been getting pinged by teammates asking whether to migrate our internal tools off Claude Haiku or GPT-4o mini. After spending a weekend running informal evals on classification and code-routing tasks, I have some takes.

Upfront caveat: I'm hedging on specific Gemini 3.5 Flash feature claims because marketing benchmarks and actual API behavior rarely match. Check the official Gemini docs before you bet your architecture on any number — including the ones I quote.

Why migrate small models at all?

Most teams I work with use small, fast models for the boring stuff:

Classification (is this ticket billing, bug, or feature-request?)

Gemini 3.5 Flash vs Claude Haiku vs GPT-4o mini: Picking a Small Model

Other newsrooms on this story

Related reading

Gemini 3.5 Flash vs Claude Haiku 4.5 vs MAI-Code-1-Flash for Coding

Claude Opus 4.8 vs Gemini 3.5 Pro vs GPT-5.6: Developer Model Selection Guide…

GPT-4o vs Claude 3.5 Sonnet vs Gemini 1.5 Pro: real API cost comparison for…

5 LLM APIs Tested for Latency: Real Data [2026]

Benchmarking Gemini 2.5 Flash vs 3.1 Flash-Lite vs Gemma 4 with LLM judge…

Gemini 3.5 Flash for Agentic Coding: A Claude Coder's Guide