Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation

In this tutorial, we use GEPA as a reflective prompt-evolution framework to improve how a small language model solves multi-step arithmetic word problems. We start from a weak seed prompt, build a deterministic benchmark, and define a structured evaluator that returns actionable feedback. A multi-component setup evolves both the instruction field and the output-format rules together. We then compare the baseline and optimized prompts on a held-out validation set to check whether the gains generalize.

domenica 7 giugno 2026 New tab

In this tutorial, we use GEPA as a reflective prompt-evolution framework to improve the way a language model solves arithmetic word problems. We begin with a weak seed prompt, create a small deterministic benchmark, define a structured evaluator, and pass actionable feedback to GEPA so it can understand why a candidate prompt fails. We also use a multi-component prompt setup in which both the instruction field and the output-format rules evolve together. By the end, we compare the baseline prompt with the optimized prompt on a held-out validation set and inspect how the evolutionary process improves performance.

Installing GEPA and LiteLLM and Configuring the Task and Reflection Models

!pip install -q gepa litellm

import os, re, json, random, getpass, textwrap

import litellm

Installing GEPA and LiteLLM and Configuring the Task and Reflection Models

!pip install -q gepa litellm

import os, re, json, random, getpass, textwrap

import litellm

Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation

Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation

Other newsrooms on this story

Related reading

The End of Manual Prompt Engineering: How Genetic-Pareto Prompt Evolution…

Beyond Prompting: Building a 4-Stage LLM Compiler with Surgical Self-Repair

RAG - Prompt Engineering

Prompts as Code: How to Version, Test, and Ship the Prompt Layer in 2026

How a model upgrade silently broke our extraction prompt (and how we caught it)

Your eval criteria are code. Version them like code.

Other newsrooms on this story

Related reading

The End of Manual Prompt Engineering: How Genetic-Pareto Prompt Evolution…

Beyond Prompting: Building a 4-Stage LLM Compiler with Surgical Self-Repair

RAG - Prompt Engineering

Prompts as Code: How to Version, Test, and Ship the Prompt Layer in 2026

How a model upgrade silently broke our extraction prompt (and how we caught it)

Your eval criteria are code. Version them like code.