I've been growing my open source portfolio one contribution at a time, and this week I landed on something genuinely interesting in livekit/agents (11k+ stars, the framework behind a ton of real-time voice AI agents).

The bug

If you're building a voice agent on a realtime model (OpenAI Realtime, xAI, Gemini Live), the model streams your transcription back in chunks. A single utterance can fire many user_input_transcribed events before it's final — token by token for OpenAI/xAI, or as one big interim blob for Gemini.

If you want to react exactly once per utterance (say, show a "user is typing" indicator on your frontend via RPC), you need a stable key to correlate all those interim events together.

That key already existed internally — InputTranscriptionCompleted carries an item_id. But when the framework re-emitted it upward as the public UserInputTranscribedEvent, the item_id was silently dropped — leaving consumers with no reliable way to dedupe across providers.