In this tutorial, we use GEPA as a reflective prompt-evolution framework to improve the way a language model solves arithmetic word problems. We begin with a weak seed prompt, create a small deterministic benchmark, define a structured evaluator, and pass actionable feedback to GEPA so it can understand why a candidate prompt fails. We also use a multi-component prompt setup in which both the instruction field and the output-format rules evolve together. By the end, we compare the baseline prompt with the optimized prompt on a held-out validation set and inspect how the evolutionary process improves performance.

Installing GEPA and LiteLLM and Configuring the Task and Reflection Models

!pip install -q gepa litellm

import os, re, json, random, getpass, textwrap

import litellm