Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing

Baidu's Unlimited OCR replaces decoder attention with R-SWA, keeping the KV cache constant to parse dozens of pages.

giovedì 25 giugno 2026 New tab

923 words~4 min read

Most end-to-end OCR models slow down as output grows. Each generated token adds to the KV cache. Memory rises and generation drags. Parsing dozens of pages becomes impractical. Baidu’s Unlimited OCR addresses this directly. It swaps the decoder’s attention for a design that keeps memory constant.

TL;DR

Unlimited OCR is a 3B-parameter Mixture-of-Experts model, with only 500M parameters active.

It replaces decoder attention with Reference Sliding Window Attention (R-SWA), keeping the KV cache constant.

The model parses dozens of pages in one forward pass under a 32K maximum length.

Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing

Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing

Other newsrooms on this story

Related reading

Baidu Unlimited OCR Holds the KV Cache Constant for 40+ Pages: Reference…

DeepSeek-V3: The 671B MoE Model You Can Run Locally in 2026

AI/ML Research Digest — May 23, 2026

The KV Cache Compression Race: TurboQuant vs OSCAR vs EpiCache

Multi-page OCR and agent orchestration: what's actually worth shipping this week

DeepSeek releases 'sparse attention' model that cuts API costs in half |…

Related reading

Baidu Unlimited OCR Holds the KV Cache Constant for 40+ Pages: Reference…

DeepSeek-V3: The 671B MoE Model You Can Run Locally in 2026

AI/ML Research Digest — May 23, 2026

The KV Cache Compression Race: TurboQuant vs OSCAR vs EpiCache

Multi-page OCR and agent orchestration: what's actually worth shipping this week

DeepSeek releases 'sparse attention' model that cuts API costs in half |…

Other newsrooms on this story