What I learned building a document chunking and embedding API for RAG

Chunking sounds like the boring part of RAG. It is also where a lot of retrieval quality is won or lost. I built a document chunking and embedding API and ran it in production, and these are the things that actually moved the needle.

Repo: https://github.com/ahmetguness/doc-chunking-api

Live demo (3 free runs): https://chunkingservice.com

Sentence-aware beats fixed-size

The naive approach is to split text every N characters or tokens. It is simple and it quietly hurts retrieval, because it cuts sentences in half and splits ideas across chunks. Sentence-aware chunking with a configurable overlap keeps each chunk coherent, so the embedding actually represents a complete thought. This one change usually improves retrieval more than swapping embedding models.

What I learned building a document chunking and embedding API for RAG

Related reading

RAG Pipeline Chunking Strategies: Split Documents for Better Retrieval

Best Chunking Strategies for RAG Pipelines

Chat with your documents: agentic RAG in a few lines

Ditch Naive Chunking: Late Chunking RAG in Spring AI

RAG Architecture Deep Dive

Dynamic chunking for RAG: building context infrastructure that adapts