I built a small RAG (Retrieval Augmented Generation) project where a user can ask questions from a PDF, and the LLM answers from that PDF along with the page number to look at. The stack is LangChain, OpenAI embeddings, and Qdrant running in Docker.
A small note before we start: this exact same pipeline is what powers web-apps like an "AI Tutor in Educative", an "AI web page builder". The only thing that changes between those products and my PDF Q&A is the data source. That is the key idea to take away.
What RAG is, in one line
Take a document → break it into small chunks → turn each chunk into a vector (a list of numbers) → store those vectors in a database. Later, when the user asks a question, turn the question into a vector too, find the closest chunks, and feed them to an LLM as context.
INDEXING (run once)












