TL;DR – We’re excited to introduce our Batch API—an asynchronous endpoint built to efficiently handle large volumes of requests. It streamlines workflows by removing the complexities of managing many synchronous requests, such as threading and retries. Compared to OpenAI, our Batch API offers a shorter 12-hour completion window, ideal for overnight runs and maximizing daytime productivity, along with higher throughput: up to 1 GB file sizes, 100K inputs per batch, and 1B tokens per organization. It’s also offered at a 33% cost savings, making it a scalable and cost-effective option.

Not every workload requires real-time processing—for example, offline vectorizing of large corpora for semantic search or running large-scale evaluations. To support these use cases, we’re launching our Batch API, an asynchronous endpoint designed to efficiently process high volumes of requests. Compared to our synchronous API, the Batch API:

Simplifies large-scale workflows by eliminating the need to manage queues, retries, threading, or rate limits. Our Batch API offers a 12-hour completion window—half of market alternatives, including OpenAI. This is great for overnight runs, maximizing daytime productivity.