You can now deploy, run, and fine-tune large language models on Replicate.

We’ve got official versions of FLAN-T5, GPT-J, and LLaMA, and you can also push any other custom model. We’re also releasing a preview of fine-tuning language models.

Language models can be run with just a couple of lines of code, like any other model on Replicate:

You can run them from Python, Node.js, or with an HTTP API, without having to set up servers or GPUs.

Try them out: