Nebius AI Cloud is a full-stack platform purpose-built for training and deploying AI models at scale. Built specifically for AI workloads, Nebius provides on-demand and reserved GPU clusters, combining bare-metal performance with cloud-native simplicity. Teams running those workloads need visibility into GPU compute, training jobs, inference services, and the LLM applications running on top of them.
The Datadog integration for Nebius AI Cloud consolidates telemetry data that would otherwise live across disconnected tools. The integration centralizes Nebius logs. The Datadog Agent collects metrics and Application Performance Monitoring (APM) traces from your compute instances, and Datadog Agent Observability libraries trace your LLM applications. If you run Nebius alongside other cloud providers, you can monitor your entire environment from a single platform.
In this post, you’ll learn how to:
- Centralize Nebius AI Cloud logs for faster incident triage
- Deploy the Datadog Agent on Nebius compute instances













