What: The Unlimited OCR release (Baidu, arXiv 2606.23050) is a 3-billion-parameter open OCR model whose decoder replaces standard attention with Reference Sliding Window Attention (R-SWA) — the trick that lets it transcribe 40+ pages in a single forward pass.
Why: The KV cache is the memory that grows with every token a model writes; on a 40-page transcription that growth can dominate inference memory and slow generation, so holding the cache constant is what makes one-pass, whole-document OCR practical.
vs prior: A standard decoder makes each new token attend to the entire growing output, so its KV cache grows linearly; R-SWA makes each token attend to the fixed document plus only the last 128 output tokens, so the cache stays a constant size.
Think of it as
A scribe copying a long book — the source kept open on the desk, and only the last line they wrote still in view.






