Build custom code-based evaluators in Amazon Bedrock AgentCore | Amazon Web Services

In this post, you will implement four Lambda-based custom code evaluators for a financial market-intelligence agent, register each with AgentCore, and run them in on-demand and online modes. You will also see how to combine custom code-based evaluators with built-in evaluators and how to call other AWS services for grounded fact-checking, PII detection, and real-time alerting.

lunedì 18 maggio 2026 New tab

Special thanks to everyone who contributed to this launch: Stephanie Yuan, Lefan Zhang, Ritvika Pillai, Irene Wang, Carter Williams, T.J Ariyawansa, Gitika Jha, Shoaib Javed and the product leadership from Vivek Singh.

Moving prototype agents to production requires measuring quality across multiple dimensions. Amazon Bedrock AgentCore Evaluations provides large language model (LLM)-as-a-Judge checks and extensible code-based evaluators that capture domain-specific requirements you need for assessing your agentic application.

In financial services and specialized domains, the critical quality dimensions often extend beyond language. A market-intelligence agent must quote stock prices within a configurable live band, follow a mandatory broker-identification workflow before accessing financial profiles, return tool outputs that conform to a strict JSON schema, and withhold personally identifiable information (PII). These checks require deterministic code that produces the same result on identical input. They can also be expensive to run with LLM-as-a-Judge when an objective piece of code is the straightforward choice.

With custom code-based evaluators, you can bring an AWS Lambda function as the evaluation engine. With custom code-based evaluators, you control the scoring logic: regex and structural validation, external data lookups, calls to other services, or business rules. The same evaluator can be used in multiple ways without requiring foundation model (FM) tokens for each request. In on-demand evaluations, it acts as a gate within development workflows and continuous integration and delivery (CI/CD) pipelines. In online evaluation setups, it can score live production traffic. With full control over the evaluation logic through AWS Lambda, you can tailor custom code-based evaluators to your needs. Even if traces come from different agent frameworks, you can use this approach to consistently assess agent quality using your own logic.

Build custom code-based evaluators in Amazon Bedrock AgentCore | Amazon Web Services

Build custom code-based evaluators in Amazon Bedrock AgentCore | Amazon Web Services

Other newsrooms on this story

Related reading

Evaluate AI agents systematically with Agent-EvalKit | Amazon Web Services

Build a test suite that grows with your agent with dataset management in Amazon…

Evaluating Deep Agents using LangSmith on AWS | Amazon Web Services

New in Amazon Bedrock AgentCore: Build agents with broader knowledge and…

Build AI-powered dashboard automation agents with NLP on Amazon Bedrock…

Build context-rich research agents with Deep Agents and Bedrock AgentCore |…

Other newsrooms on this story

Related reading

Evaluate AI agents systematically with Agent-EvalKit | Amazon Web Services

Build a test suite that grows with your agent with dataset management in Amazon…

Evaluating Deep Agents using LangSmith on AWS | Amazon Web Services

New in Amazon Bedrock AgentCore: Build agents with broader knowledge and…

Build AI-powered dashboard automation agents with NLP on Amazon Bedrock…

Build context-rich research agents with Deep Agents and Bedrock AgentCore |…