Your LLM has 128K tokens.
Your document has 150K words.
Something has to give. What do you do?
A) Chunk the document into fixed-size pieces and embed each one — retrieve the top-k at query time.
B) Use a sliding window — process the document in overlapping chunks, stitch the outputs together.







