Breaking the Dense Ceiling: How voyage-4-large Uses MoE to Scale

Author: Hong Liu

Efficient scaling of embedding models has been a core focus of Voyage AI research: Rather than simply scaling up, we aim to improve the quality-cost trade-off—extending the Pareto frontier beyond what is possible with standard architectures. In the Voyage 3.5 series, we pushed the scaling trends of traditional dense embedding models to their practical limits. To further push the Pareto frontier, we introduced a mixture of experts (MoE) architecture in voyage-4-large.

In this blog post, we are excited to share more insights on how we incorporate MoE and use it to improve scaling efficiency.

First, we discuss the concepts of dense and MoE embedding models.

Next, we walk through the design choices of implementing MoE embedding models, including how we optimize them during voyage-4-large development.

Author: Hong Liu

In this blog post, we are excited to share more insights on how we incorporate MoE and use it to improve scaling efficiency.

First, we discuss the concepts of dense and MoE embedding models.

Next, we walk through the design choices of implementing MoE embedding models, including how we optimize them during voyage-4-large development.

Breaking the Dense Ceiling: How voyage-4-large Uses MoE to Scale

Breaking the Dense Ceiling: How voyage-4-large Uses MoE to Scale

Other newsrooms on this story

Related reading

The Voyage 4 model family: shared embedding space with MoE architecture

voyage-3.5 and voyage-3.5-lite: improved quality for a new retrieval frontier

voyage-context-4: stop worrying about chunking with our best-performing model

Parcae: Doing more with fewer parameters using stable looped models

Boosting MoE Training Throughput with Advanced Fusion Kernels | NVIDIA…

Announcing New Models and Expanded Availability

Other newsrooms on this story

Related reading

The Voyage 4 model family: shared embedding space with MoE architecture

voyage-3.5 and voyage-3.5-lite: improved quality for a new retrieval frontier

voyage-context-4: stop worrying about chunking with our best-performing model

Parcae: Doing more with fewer parameters using stable looped models

Boosting MoE Training Throughput with Advanced Fusion Kernels | NVIDIA…

Announcing New Models and Expanded Availability