Your health check is a single line. res.send('ok'). It used to take a millisecond. Then traffic ramped up one afternoon and p99 went to 400ms, and you spent the next three hours staring at dashboards that all said the same thing, which was nothing.

CPU is moderate, event loop lag is flat, memory looks healthy, and your APM is reporting that the request took 400ms while telling you nothing about why. No slow database spans, no slow downstream calls, no errors or GC pauses. The time was spent in a place your APM can't see.

The place is the libuv thread pool. Standard Node observability is built around the event loop, and the pool is a different queue with different occupants, sitting just out of reach of every dashboard you have.

What lives on the pool

Node's event loop runs your JavaScript. Anything that would block the loop, because it's CPU-heavy or because it's blocking I/O on a syscall that has no real async kernel variant, gets pushed to a separate pool of OS threads inside libuv. That pool defaults to four threads. Four, total, for the whole process, regardless of how many cores the machine has.