Your AI can read. Gemma 4 can see

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

Your AI can read. Gemma 4 can see. Here's what that actually changes.

For two years, talking to an AI meant typing. You described things in words, the AI answered in words. If you wanted help with a photo, a handwritten note, or a screenshot, you first had to translate it into a paragraph — and hope you didn't leave out the part that mattered.

Gemma 4 is multimodal, which is a clunky word for a simple idea: you can show it a picture instead of describing one. I spent an afternoon doing exactly that, and the gap between "tell the AI" and "show the AI" turned out to be bigger than I expected.

Here's what multimodal actually means, three things I showed it, and how you can try it yourself in about five minutes — free, no fancy hardware.

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

Your AI can read. Gemma 4 can see. Here's what that actually changes.

Here's what multimodal actually means, three things I showed it, and how you can try it yourself in about five minutes — free, no fancy hardware.

Your AI can read. Gemma 4 can see

Your AI can read. Gemma 4 can see

Related reading

Why Gemma 4 Feels Like an Important Moment for AI Developers✨

I was trying to Learning About Gemma 4 and It was pretty good

I Used Gemma 4 as a Private Log Analyst for App Crashes

Your AI, Your Device, Your Data - Introducing Aide

I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened.

Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀

Related reading

Why Gemma 4 Feels Like an Important Moment for AI Developers✨

I was trying to Learning About Gemma 4 and It was pretty good

I Used Gemma 4 as a Private Log Analyst for App Crashes

Your AI, Your Device, Your Data - Introducing Aide

I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened.

Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀