Your LLM prompt doesn't fit? Pack it by priority (zero dependencies)

Every RAG app and agent eventually hits the same wall: you have more stuff than fits in the model's context window — a system prompt, chat history, retrieved documents, tool output — and a fixed token budget.

The usual "fix" is to truncate the whole blob at the end. Which means you randomly chop off whatever happened to be last: sometimes a doc, sometimes half your system prompt. You drop the wrong things.

I got tired of rewriting that logic in every project, so I built contextcram — a tiny, zero-dependency library that treats this as a prioritized packing problem.

The idea

Give each piece of context a priority and a strategy for what should happen if it doesn't fit. Set a token budget. contextcram assembles the largest in-budget context that keeps the important parts.

I got tired of rewriting that logic in every project, so I built contextcram — a tiny, zero-dependency library that treats this as a prioritized packing problem.

The idea

Give each piece of context a priority and a strategy for what should happen if it doesn't fit. Set a token budget. contextcram assembles the largest in-budget context that keeps the important parts.

Your LLM prompt doesn't fit? Pack it by priority (zero dependencies)

Other newsrooms on this story

Your LLM prompt doesn't fit? Pack it by priority (zero dependencies)

Other newsrooms on this story

Related reading

Prompt Bloat: Causes, Costs & Fixes for LLM Apps

38/60 Days System Design Questions

Add a PASS/WARN/FAIL Quality Gate to Your RAG Pipeline in 30 Seconds

The tokens-per-byte trap: character-level 'compression' adds tokens

Stop letting the prompt be your state machine

How to Cheat LLM Context: A Lightweight AI Doc Assistant Architecture

Related reading

Prompt Bloat: Causes, Costs & Fixes for LLM Apps

38/60 Days System Design Questions

Add a PASS/WARN/FAIL Quality Gate to Your RAG Pipeline in 30 Seconds

The tokens-per-byte trap: character-level 'compression' adds tokens

Stop letting the prompt be your state machine

How to Cheat LLM Context: A Lightweight AI Doc Assistant Architecture