The Problem
A media company needed to evaluate which AI model produces the best podcast-style summaries from news articles. They wanted to:
Send an article to multiple AI models simultaneously
Compare the outputs side by side
Score each output automatically










