When upserts don’t update but still write: Debugging Postgres performance at scale

At Datadog, we track the life cycle of millions of ephemeral hosts that report telemetry data to our platform. When a host stops emitting data, we eventually need to clean it up to avoid bloating our metadata store.

To detect inactive hosts, the Datadog team that manages the host metadata store introduced a new upsert to track the last time a host was seen. We expected this new query to have minimal impact. Each host would be updated at most once a day, so even at 25,000 upserts per second, most queries should have been no-ops.

But when we rolled out the new query, disk writes doubled and Write-Ahead Logging (WAL) syncs quadrupled. We discovered that even when an upsert doesn’t change any values, it still locks the conflicting row, which is recorded in the WAL. Given that a Postgres cluster can only have a single writer, there’s a hard limit to how many writes it can handle. The increase in disk writes introduced by the new query was consuming too much of this limited budget and had to be fixed.

In this post, we’ll walk through how we diagnosed the unexpected overhead by inspecting Postgres’s WAL and how we rewrote the query to eliminate the cost without sacrificing correctness.

Designing a low-overhead upsert in Postgres

In this post, we’ll walk through how we diagnosed the unexpected overhead by inspecting Postgres’s WAL and how we rewrote the query to eliminate the cost without sacrificing correctness.

Designing a low-overhead upsert in Postgres

When upserts don’t update but still write: Debugging Postgres performance at scale | Datadog

Other newsrooms on this story

When upserts don’t update but still write: Debugging Postgres performance at scale | Datadog

Other newsrooms on this story

Related reading

How Postgres Writes Data: WAL, Shared Buffers, and the Illusion of Speed

PostgreSQL WAL Bloat: Why Automatic Management Is Often Insufficient?

PostgreSQL WAL Bloat Management: Reclaiming Disk Space in 4 Steps

Database WAL Bloat: How to Overcome the Performance Trap?

Stream Postgres WAL to Redis: Real-Time Read Models with pg2redis

I got tired of writing the same history table boilerplate, so I built a…