Multimodal Embedding & Reranker Models with Sentence Transformers

Back to Articles

Sentence Transformers is a Python library for using and training embedding and reranker models for applications like retrieval augmented generation, semantic search, and more. With the v5.4 update, you can now encode and compare texts, images, audio, and videos using the same familiar API. In this blogpost, I'll show you how to use these new multimodal capabilities for both embedding and reranking.

Multimodal embedding models map inputs from different modalities into a shared embedding space, while multimodal reranker models score the relevance of mixed-modality pairs. This opens up use cases like visual document retrieval, cross-modal search, and multimodal RAG pipelines.

If you want to train your own multimodal models, check out the companion blogpost: Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers.

Table of Contents

Back to Articles

If you want to train your own multimodal models, check out the companion blogpost: Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers.

Table of Contents

Multimodal Embedding & Reranker Models with Sentence Transformers

Multimodal Embedding & Reranker Models with Sentence Transformers

Related reading

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence…

Building Semantic Search with Transformers.js and Sentence Embeddings -…

Introducing the Ettin Reranker Family

Multimodal Browser AI with Transformers.js for Images and Speech -…

Multimodal Embeddings and RAG: A Practical Guide | Weaviate

voyage-multimodal-3.5: a new multimodal retrieval frontier with video support

Related reading

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence…

Building Semantic Search with Transformers.js and Sentence Embeddings -…

Introducing the Ettin Reranker Family

Multimodal Browser AI with Transformers.js for Images and Speech -…

Multimodal Embeddings and RAG: A Practical Guide | Weaviate

voyage-multimodal-3.5: a new multimodal retrieval frontier with video support