Researchers from NYU, Columbia, Princeton, and others introduce LCLMs, achieving 16x context compression and 8.8x faster inference with no accuracy loss.

LCLMs compress LLM context before decode — 8.8x faster at 16x compression, beating every KV cache method tested. Open-sourced by NYU and Columbia.

Researchers from NYU, Columbia, Princeton, and others introduce LCLMs, achieving 16x context compression and 8.8x faster inference with no accuracy loss.