Fine-tuned models now boot in less than one second – Replicate blog

We've made some dramatic improvements to cold boots for fine-tuned models.

domenica 17 maggio 2026 New tab

136 words~1 min read

Posted September 6, 2023 by andreasjansson You can fine-tune language models like Llama 2 or image models like SDXL with your own data on Replicate. If you don’t make any requests to your fine-tuned model for a while, it can take some time to start again. This is called a cold boot, and can be as slow as a few minutes for large models.

We’ve made some dramatic improvements to cold boots for fine-tuned models. They now boot in less than one second.

It works on these models:

meta/llama-2-7b-chat

meta/llama-2-13b-chat

Fine-tuned models now boot in less than one second – Replicate blog

Fine-tuned models now boot in less than one second – Replicate blog

Other newsrooms on this story

Related reading

Language models are on Replicate – Replicate blog

FLUX fine-tunes are now fast – Replicate blog

I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)

Faster AI training by quietly cloning the model

Fine-tune SDXL with your own images – Replicate blog

LoRA: Fine-Tune a Giant Model by Training 1% of It

Other newsrooms on this story

Related reading

Language models are on Replicate – Replicate blog

FLUX fine-tunes are now fast – Replicate blog

I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)

Faster AI training by quietly cloning the model

Fine-tune SDXL with your own images – Replicate blog

LoRA: Fine-Tune a Giant Model by Training 1% of It