Streaming output for language models – Replicate blog

Posted August 14, 2023 by zeke

You know when you’re using ChatGPT or Vercel’s AI playground and it returns an animated response, rendered word by word? That’s not just a dramatic visual effect to make it look like there’s a robot typing on the other side of the conversation. That’s actually the language model generating tokens one at a time, and streaming them back to you while it’s running.

Replicate already provides ways for you to receive incremental updates as your predictions are running, through polling and webhooks. But those aren’t always the most efficient methods to get updates from a running model. When you’re building something like a chat app, what you really need is a live-updating event stream.

Replicate’s API now supports server-sent event streams for language models. This lets you update your app live, as the model is running. In this post we’ll show you how to consume streaming responses from language models on Replicate.

How streaming works

Posted August 14, 2023 by zeke

How streaming works

Streaming output for language models – Replicate blog

Other newsrooms on this story

Streaming output for language models – Replicate blog

Other newsrooms on this story

Related reading

Language models are on Replicate – Replicate blog

Streaming LLM Responses: Make Your AI App Feel Fast

Run OpenAI’s latest models on Replicate – Replicate blog

Replicate Intelligence #4 – Replicate blog

Build a robot artist for your Discord server with Stable Diffusion, Replicate,…

Language model roundup, April 2023 – Replicate blog

Related reading

Language models are on Replicate – Replicate blog

Streaming LLM Responses: Make Your AI App Feel Fast

Run OpenAI’s latest models on Replicate – Replicate blog

Replicate Intelligence #4 – Replicate blog

Build a robot artist for your Discord server with Stable Diffusion, Replicate,…

Language model roundup, April 2023 – Replicate blog