Code review was always manual and ineffective because of the inherent disconnect between code and product. Developers could review whether code compiled and worked, but not whether it fulfilled all functional and design requirements. In the past, QA teams spent hours manually clicking through preview environments to ensure features behaved as expected, and even more time aligning implementations with design intent. This manual validation slowed delivery, introduced inconsistency, and increased the likelihood of regressions. With the increased velocity of development teams, Baz wanted to automate this missing layer of verification, bringing intent, behavior, and implementation into a single review workflow.

This post walks through how Baz built their Spec Review agent using Amazon Bedrock and Amazon Bedrock AgentCore. We’ll cover the architecture decisions, implementation details, and the business outcomes they achieved by leveraging these AWS services to automate their code review process

The key problems Baz is trying to solve

Baz is built to move beyond traditional, diff-only reviews and toward validating whether a feature meets its intended product requirements. Early on, Baz saw that teams struggled with reviews that focused on syntax rather than behaviors, leaving critical questions like “does it work”, “does it match the spec”, “does it behave as intended”, to be answered manually and late in the process. This gap between code and product intent slowed the team down, created design inconsistencies, and required a heavy reliance on undocumented QA internal knowledge Baz set out to close this gap by building agents that could evaluate not just code, but the actual delivered experience.