This is a submission for the Gemma 4 Challenge: Write About Gemma 4

Gemma 4 ships with native function calling built in — trained from scratch, not prompt-engineered. But "built in" and "tuned for your specific tools" are different things.

If you have a set of internal APIs, a specific tool schema, or edge-case behaviors that the base model handles inconsistently, fine-tuning on your own function-calling data is the right move. TRL (Transformer Reinforcement Learning library) added multimodal tool response support in the same release window as Gemma 4, making this the first time you can fine-tune a multimodal model on tool use — including image outputs from tools.

This guide walks through the full pipeline: data format, fine-tuning with QLoRA, and evaluation.

What TRL's Multimodal Tool Support Actually Adds