Vector search underpins most retrieval-augmented generation (RAG) pipelines. At scale, it gets expensive. Storing 10 million document embeddings in float32 consumes 31 GB of RAM. For dev teams running local or on-premise inference, that number creates real constraints.

A new open-source library called turbovec addresses this directly. It is a vector index written in Rust with Python bindings. It is built on TurboQuant, a quantization algorithm from Google Research. The same 10-million-document corpus fits in 4 GB with turbovec. On ARM hardware, search speed beats FAISS IndexPQFastScan by 12–20%.

The TurboQuant Paper

TurboQuant was introduced by Google’s research team. The Google team proposes TurboQuant as a data-oblivious quantizer. It achieves near-optimal distortion rates across all bit-widths and dimensions. It requires zero training and zero passes over the data.

Most production-grade vector quantizers, including FAISS’s Product Quantization, requires a codebook training step. You must run k-means over a representative sample of your vectors before indexing begins. If your corpus grows or shifts, you may need to retrain and rebuild the index entirely. TurboQuant skips all of that. It uses an analytical property of rotated vectors instead of a data-dependent calibration.