Debug and evaluate your AI app from your coding agent with Datadog Agent Observability

Coding agents like Claude Code, Cursor, and Codex CLI handle the coding parts of building an AI application well. The harder work comes after: understanding why a response went wrong, building eval sets that reflect real production behavior, and keeping up with an application that changes faster than any one-off script can. Teams spend 60–80% of their time on evaluation and error analysis, and much of that work needs to be redone every time the stack shifts.

Datadog Agent Observability already captures the telemetry data needed to answer those questions. It traces every prompt and response and runs online evaluations over them. To make that telemetry data usable from inside your coding agent, we’ve built two foundations. The Agent Observability toolset in the Datadog MCP Server gives agents structured access to Agent Observability data. The Pup CLI, a command-line interface into much of Datadog’s API surface. On top of these foundations, we’re shipping a set of Agent Skills that package common AI engineering tasks into single commands. Drop them into your agent’s skills directory, and your coding agent can classify sessions, debug production failures, and evaluate new versions of your application against real traffic.

Debug and evaluate your AI app from your coding agent with Datadog Agent Observability | Datadog

Other newsrooms on this story

Debug and evaluate your AI app from your coding agent with Datadog Agent Observability | Datadog

Other newsrooms on this story

Related reading

Improve AI agent quality with Bits Evals | Datadog

Bring live Datadog telemetry into your AI agents with native integrations |…

Ship internal applications from your AI Agent with Datadog Apps | Datadog

AI Agent Failure Detection and Root Cause Analysis with Strands Evals | Amazon…

AI agent observability: The developer's guide to agent monitoring

Using Evaluation Frameworks with Agent Observability | Datadog