Author(s): Swaraj Patil

Originally published on Towards AI.

SBERT, also known as Sentence-Bert, is a widely used approach for obtaining sentence embeddings that aim to retain the contextual information within the sentences. However, generating these embeddings can be slow when dealing with large amounts of data. To address this, one option is to utilize batch-based encoding to accelerate the inference. However, this may not necessarily reduce the inference time. In this Medium blog post, we will explore the application of the ONNX (Open Neural Network Exchange) framework and how it aids in reducing the inference time of the model.

P.S. This article does not delve into the internal workings of ONNX. For more in-depth information, please consult the official ONNX documentation.

Let’s begin by installing the import libraries. We can use pip for the installation of ONNX