Tweet 1

Every LLM call burns GPU cycles on tokens that never needed to run.

Padding. Boilerplate. Irrelevant context.

I built SuperCompress — a tiny CPU policy that cuts 65% of tokens before inference.

Open source. MIT. Free tier.