In Part 1 of this series, we explored how Karpenter’s architecture enables just-in-time provisioning and active node consolidation. Because Karpenter is constantly making infrastructure decisions based on real-time scheduling pressure, its metrics can give you early warning of provisioning slowdowns, cloud API throttling, and misconfigurations that prevent it from scaling the way you expect. In this post, we’ll show you key metrics you can monitor to understand Karpenter’s behavior and performance. As you collect Karpenter metrics, note that each one is marked as STABLE, BETA, ALPHA, or DEPRECATED. BETA and ALPHA metrics are useful, but they’re more likely to change across versions, so you should treat them as a signal to double-check your dashboards after upgrades.

Track Karpenter metrics to monitor performance

Karpenter exposes Prometheus-formatted metrics via an HTTP endpoint at /metrics on the Karpenter controller. The default metrics port is 8080. This can be overridden at install time via the METRICS_PORT environment variable.

You can collect Karpenter metrics by using either of two approaches. If you use the Prometheus Operator with a ServiceMonitor, you can determine the metrics endpoint port by examining the Karpenter service, such as with this command: