LLM observability tools are blind to the voice layer. Here is what I checked 6 of them for.

Tracing the LLM call is the easy 20 percent. For a voice agent, the failures live in the audio layer your tracer never sees.

Most LLM observability tools trace the same thing: the prompt, the completion, the tokens, the latency of the model call. For a text agent that is most of the story. For a voice agent it is maybe a fifth of it, because the failures that actually make a voice agent feel broken happen in the audio layer, and a tracer pointed at the LLM call cannot see them. I went through six observability tools (Langfuse, Helicone, Arize Phoenix, LangSmith, Braintrust, and Laminar) asking one question each: can it show me the audio layer, or only the LLM call?

The audio layer is where the real spans are. End-of-turn detection: how long did the agent wait before deciding the caller was done? ASR latency and confidence: how long did transcription take, and how sure was it? Barge-in: did the caller interrupt, and did the agent yield? Time-to-first-audio: how long from the caller finishing to the agent making a sound? None of these are LLM-call metrics, and a green LLM-latency dashboard tells you nothing about any of them. I have watched a voice agent with a perfectly healthy model-call trace feel sluggish and rude to every caller, because the lag and the interruptions lived in spans the tracer was not capturing.

Tracing the LLM call is the easy 20 percent. For a voice agent, the failures live in the audio layer your tracer never sees.

LLM observability tools are blind to the voice layer. Here is what I checked 6 of them for.

LLM observability tools are blind to the voice layer. Here is what I checked 6 of them for.

Related reading

Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable

LLM observability: Your guide to monitoring AI in production

The 4-layer voice-agent latency stack, traced with OTel spans

LLM Observability Tools for Reliable AI Applications -…

Annotate traces to improve LLM quality with Datadog LLM Observability | Datadog

Understand production LLM behavior with Patterns in Agent Observability |…

Related reading

Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable

LLM observability: Your guide to monitoring AI in production

The 4-layer voice-agent latency stack, traced with OTel spans

LLM Observability Tools for Reliable AI Applications -…

Annotate traces to improve LLM quality with Datadog LLM Observability | Datadog

Understand production LLM behavior with Patterns in Agent Observability |…