Author: Hong Liu
Efficient scaling of embedding models has been a core focus of Voyage AI research: Rather than simply scaling up, we aim to improve the quality-cost trade-off—extending the Pareto frontier beyond what is possible with standard architectures. In the Voyage 3.5 series, we pushed the scaling trends of traditional dense embedding models to their practical limits. To further push the Pareto frontier, we introduced a mixture of experts (MoE) architecture in voyage-4-large.
In this blog post, we are excited to share more insights on how we incorporate MoE and use it to improve scaling efficiency.
First, we discuss the concepts of dense and MoE embedding models.
Next, we walk through the design choices of implementing MoE embedding models, including how we optimize them during voyage-4-large development.







