The Problem

A media company needed to evaluate which AI model produces the best podcast-style summaries from news articles. They wanted to:

Send an article to multiple AI models simultaneously

Compare the outputs side by side

Score each output automatically