Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

Our new Batch Inference API makes large-scale AI workloads simpler, faster, and cheaper. With a streamlined UI, universal model support, and 3000× higher rate limits—now up to 30B tokens—you can process massive datasets at half the cost of real-time APIs.

domenica 17 maggio 2026 New tab

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit IncreaseWe've rolled out major improvements to our Batch Inference API, making it simpler, faster, and more powerful for teams processing massive datasets.What's NewStreamlined UICreate and track batch jobs in an intuitive interface — no complex API calls required.Universal Model AccessThe Batch Inference API now supports all serverless models and private deployments, so you can run batch workloads on exactly the models you need.Massive Scale JumpRate limits are up from 10M to 30B enqueued tokens per model per user, a 3000× increase. Need more? We'll work with you to customize.Lower CostFor most serverless models, the Batch Inference API runs at 50% the cost of our real-time API, making it the most economical way to process high-throughput workloads.Batch Inference API in Action"We rely on the Batch Inference API to process very large amounts of requests. The high rate limits—up to 30B enqueued tokens—let us run massive experiments without bottlenecks, and jobs consistently finish well under the 24-hour SLA, often within just hours. It's transformed the pace at which we can test and iterate." — Volodymyr Kuleshov, Co-Founder, Inception LabsInception Labs is one of many teams leveraging the Batch Inference API to accelerate experimentation and production workloads. From research datasets to customer-facing applications, Batch enables large-scale processing that simply wasn't feasible before.Ideal Use CasesThe Batch Inference API is perfect when you need high throughput without real-time constraints:Large-scale text analysis: Sentiment analysis, document classification, content taggingFraud detection: Scan millions of transactions for anomaliesSynthetic data generation: Create massive training datasetsEmbedding generation: Turn large corpora into vector representationsContent moderation: Process user-generated content at scaleModel evaluation: Run large benchmark suitesCustomer support automation: Handle tickets with longer SLAs efficientlyLooking AheadThese updates mark a major step forward in making large-scale inference both accessible and cost-effective. With an upgraded UI, universal model support, and dramatically higher rate limits––all at typically half the cost of real-time APIs––the Batch Inference API is the most efficient way to handle massive workloads.Try the Batch Inference API today and start scaling your experiments without limits.

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

Improved Batch Inference API: Enhanced UI, Expanded Model Support, and 3000× Rate Limit Increase

Other newsrooms on this story

Related reading

Introducing the Batch API: Simpler and more efficient for large-scale workloads

Introducing Batch Processing for ZeroGPU

Quick Tip: Cut Your AI API Bill by 90% in Under 10 Minutes

OpenAI cuts inference costs in half with new optimization technique

ByteDance discovers new scaling law that could sustain the AI boom past its…

Quick Tip: Cut Your AI Inference Costs by 80% in Under 10 Minutes

Other newsrooms on this story

Related reading

Introducing the Batch API: Simpler and more efficient for large-scale workloads

Introducing Batch Processing for ZeroGPU

Quick Tip: Cut Your AI API Bill by 90% in Under 10 Minutes

OpenAI cuts inference costs in half with new optimization technique

ByteDance discovers new scaling law that could sustain the AI boom past its…

Quick Tip: Cut Your AI Inference Costs by 80% in Under 10 Minutes