Closing the verification loop: Observability-driven harnesses for building with agents

AI agents can now produce software faster than any team can verify it. The bottleneck has moved from writing code to trusting what was written.

We have seen this pattern before. Early programmers resisted compilers because they could write better assembly by hand. Often they were right. Compilers earned trust because the languages they translate have precise semantics: The programmer defines what the program does; the compiler has freedom over how it is implemented. Automation has consistently won only when paired with verification.

With AI agents, building trust is more challenging than in the case of compilers. AI agents ingest unrestricted natural language, sometimes from untrusted sources, and translate it into running code. We must find new ways to verify the outputs of these new program synthesis engines.

At Datadog, we see this as our opportunity: preventing “vibe-coding” from spiraling into “yolo-deploys.” Our approach is harness-first engineering: instead of reading every line of agent-generated code, invest in automated checks that can tell us with high confidence, in seconds, whether the code is correct. The agent generates code, the harness verifies it, production telemetry validates it, and if something is wrong, the feedback updates the harness and the agent tries again. The specific methods to develop harnesses vary in rigor—deterministic simulation testing, formal specifications, shadow evaluation, observability-driven feedback loops—but the principle remains the same: make the verification fast and automatic, and let the harness do the work that human review cannot scale to do.

Closing the verification loop: Observability-driven harnesses for building with agents | Datadog

Related reading

Closing the verification loop, Part 2: Fully autonomous optimization | Datadog

AI | Datadog Official Blog

Agent Loop and Harness: A Practical Engineering View of AI Operations

The Sequence Opinion #844: Harness Engineering: The Operating System for…

A note on building reliability infrastructure for AI agents — and why…

AI Tools Need Contracts, Not Prompts