Introducing the Batch API: Simpler and more efficient for large-scale workloads

TL;DR – We’re excited to introduce our Batch API—an asynchronous endpoint built to efficiently handle large volumes of requests. It streamlines workflows by removing the complexities of managing many synchronous requests, such as threading and retries. Compared to OpenAI, our Batch API offers a shorter 12-hour completion window, ideal for overnight runs and maximizing daytime productivity, along with higher throughput: up to 1 GB file sizes, 100K inputs per batch, and 1B tokens per organization. It’s also offered at a 33% cost savings, making it a scalable and cost-effective option.

Not every workload requires real-time processing—for example, offline vectorizing of large corpora for semantic search or running large-scale evaluations. To support these use cases, we’re launching our Batch API, an asynchronous endpoint designed to efficiently process high volumes of requests. Compared to our synchronous API, the Batch API:

Simplifies large-scale workflows by eliminating the need to manage queues, retries, threading, or rate limits. Our Batch API offers a 12-hour completion window—half of market alternatives, including OpenAI. This is great for overnight runs, maximizing daytime productivity.

Introducing the Batch API: Simpler and more efficient for large-scale workloads

Introducing the Batch API: Simpler and more efficient for large-scale workloads

Related reading

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000×…

Mistral batch API

Anthropic Batch API for Asynchronous Multi-Tenant AI Processing: Cutting Claude…

Introducing Batch Processing for ZeroGPU

Batch Worker — 100 AI Agents in Parallel, Zero-Token Cleanup

Unlocking asynchronicity in continuous batching

Related reading

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000×…

Mistral batch API

Anthropic Batch API for Asynchronous Multi-Tenant AI Processing: Cutting Claude…

Introducing Batch Processing for ZeroGPU

Batch Worker — 100 AI Agents in Parallel, Zero-Token Cleanup

Unlocking asynchronicity in continuous batching