Virtuals integrates Leyten's distributed GPU inference engine to run GLM-5.2 across its AI agent network

Running a model with roughly 744 billion parameters is not something you do on a single graphics card. Virtuals Protocol just partnered with Leyten to make sure it doesn’t have to.

The AI agent platform has integrated Leyten’s shard engine, a system designed to distribute large-model inference across multiple GPUs over a network. The immediate target: running GLM-5.2, the open-weight model from Z.ai that dropped publicly under an MIT license on June 16, 2026. The combination gives Virtuals a path to frontier-scale AI inference without relying on centralized cloud providers or single massive GPU clusters.

What GLM-5.2 actually is, and why it matters here

GLM-5.2 is a big model. We’re talking approximately 744 billion total parameters, though only around 39 to 40 billion are active per token. In English: the model uses a mixture-of-experts architecture that keeps most of its knowledge stored but only fires up a fraction of it for any given task, keeping compute costs manageable despite the enormous overall size.

The model also ships with a context window of 1 million tokens. That’s five times larger than its predecessor, GLM-5.1.

Running a model with roughly 744 billion parameters is not something you do on a single graphics card. Virtuals Protocol just partnered with Leyten to make sure it doesn’t have to.

What GLM-5.2 actually is, and why it matters here

The model also ships with a context window of 1 million tokens. That’s five times larger than its predecessor, GLM-5.1.

Virtuals integrates Leyten's distributed GPU inference engine to run GLM-5.2 across its AI agent network

Virtuals integrates Leyten's distributed GPU inference engine to run GLM-5.2 across its AI agent network

Other newsrooms on this story

Related reading

Virtual AI testbed lets developers verify massive LLM servers before…

Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding…

Run GLM-5.2 Locally: The Open Model Nobody Can Ban

Z.ai pitches GLM-5.2 for long-running software engineering tasks

How I Run a 50-Agent AI Workforce on a Single 6GB GPU

Running AI on mixed hardware for speed and affordability

Other newsrooms on this story

Related reading

Virtual AI testbed lets developers verify massive LLM servers before…

Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding…

Run GLM-5.2 Locally: The Open Model Nobody Can Ban

Z.ai pitches GLM-5.2 for long-running software engineering tasks

How I Run a 50-Agent AI Workforce on a Single 6GB GPU

Running AI on mixed hardware for speed and affordability