Collecting images enables us to customize powerful machine learning models in new and exciting ways. For example, some of text-to-image models on Replicate can be steered using an existing image. This capability is great for when we want to steer vision models toward a particular scene or aesthetic, but it requires that we have example images of our own.
I’m Clay, a member of LAION and of the team at Replicate. In this post, I’m going to show you how to use a pip package called clip-retrieval to collect hundreds of images (and captions) from the LAION-5B dataset. We’ll look at how to collect images that either match a text description or have a similar style to some existing images.
clip-retrieval was developed by a fellow member of LAION, Romain Beaumont. It works by embedding the billions of images and captions in the LAION dataset with CLIP. Using the magic of k-NN and autofaiss, we can create an in-memory index over these embeddings with fairly fast retrieval times. If you’re interested in how this works on a technical level, I recommend reading Romain’s article “Semantic search with embeddings: index anything”.
Getting started
Let’s get started by installing clip-retrieval:






