Microsoft open sources AI evaluation framework for enterprise agents

Microsoft has open-sourced an AI evaluation framework that converts natural-language requirements into executable tests, expanding its push into enterprise AI governance as organizations struggle to validate agent behavior before production deployments systematically.

The framework, called ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), generates evaluation scenarios, datasets, metrics, and scorecards from written specifications, product requirements, and governance documents, Microsoft said in a blog post announcing the release.

“Agents fail in ways that are hard to see,” Microsoft wrote in the blog post. “They drift from policy, produce unsafe outputs in edge cases, and behave differently in production than they did in testing. Generic benchmarks do not catch these failures because they are not built around your policies, your agent, or your use case.”

Rather than requiring developers to manually create evaluation suites, ASSERT translates written intent into reusable tests that can be integrated into AI development pipelines, the company said in the blog post.

With ASSERT, Microsoft is entering an increasingly competitive AI evaluation market that already includes platforms such as LangChain’s LangSmith, Braintrust, Patronus AI, Galileo, Arize AI’s Phoenix, and Promptfoo, which help enterprises benchmark, monitor, and validate large language model applications.

Microsoft open sources AI evaluation framework for enterprise agents

Other newsrooms on this story

Related reading

Microsoft releases open-source tools to operationalize AI agent safety

Microsoft offers devs a better way to control AI agent behavior | TechCrunch

Microsoft Foundry Just Added CI/CD for AI Agents. Here's What That Actually…

Microsoft trains sales staff to promote in-house AI over OpenAI and Anthropic

Microsoft open-source toolkit secures AI agents at runtime

New Microsoft tool lets devs spin up AI behavior tests using text descriptions…