Extractive vs abstractive compression of retrieved chunks. Sentence-level filtering. How to cut tokens without losing the answer.

Pre-filter by metadata to shrink the search space before the vector index runs. The recall lift, the cardinality trap, and the code.

Most RAG pipelines embed the user's raw query and skip the cheapest recall win there is: rewriting it before search.

Extractive vs abstractive compression of retrieved chunks. Sentence-level filtering. How to cut tokens without losing the answer.