This is a submission for the Gemma 4 Challenge: Write About Gemma 4
Your AI can read. Gemma 4 can see. Here's what that actually changes.
For two years, talking to an AI meant typing. You described things in words, the AI answered in words. If you wanted help with a photo, a handwritten note, or a screenshot, you first had to translate it into a paragraph — and hope you didn't leave out the part that mattered.
Gemma 4 is multimodal, which is a clunky word for a simple idea: you can show it a picture instead of describing one. I spent an afternoon doing exactly that, and the gap between "tell the AI" and "show the AI" turned out to be bigger than I expected.
Here's what multimodal actually means, three things I showed it, and how you can try it yourself in about five minutes — free, no fancy hardware.






