Evaluate AI agent quality with LLM-as-Judge and trajectory analysis. Catch silent failures, wasted tokens, and hallucinations before production. Python tutorial with code.