Tweet 1
Every LLM call burns GPU cycles on tokens that never needed to run.
Padding. Boilerplate. Irrelevant context.
I built SuperCompress — a tiny CPU policy that cuts 65% of tokens before inference.
Open source. MIT. Free tier.
A short thread-style post about SuperCompress - open source prompt compression that saves 65% on tokens.
Tweet 1
Every LLM call burns GPU cycles on tokens that never needed to run.
Padding. Boilerplate. Irrelevant context.
I built SuperCompress — a tiny CPU policy that cuts 65% of tokens before inference.
Open source. MIT. Free tier.

A technical deep-dive into building SuperCompress - a 5K parameter CPU policy that compresses LLM prompts by 65% with 100% oracle…

I built an open-source prompt compressor now available on PyPI. Here's the story.

SuperCompress - open source LLM prompt compression - is now available on PyPI. 65% fewer tokens, 100% oracle recall.

If you're building AI agents or running LLM pipelines in production, you already know the pain: tool...

Block-hash and radix-tree prefix caching in vLLM and SGLang — when it actually saves prefill cost, and the eviction policies that…

Running LLMs across Cloudflare’s network requires us to be smarter and more efficient about GPU memory bandwidth. That’s why we…