TL;DR: Phase 1 of a from-scratch RAG app — Spring AI, pgvector, local Ollama — ends with a working pipeline and two failures that look identical from the outside but have nothing to do with each other. One was a chunking bug. The other was a 3B model running out of brain. Here's how I told them apart.

Why build this

I'm finishing my apprenticeship as a IT specialist in application development and wanted a portfolio project that's more than another CRUD app. Kenning is a document-chat tool: upload a file, ask questions about it, get answers with sources attached. Standard RAG (Retrieval-Augmented Generation), but built end to end by hand instead of stitched together from a tutorial.

Phase 0 was infrastructure: Docker Compose with pgvector/pgvector:pg16 and ollama/ollama, a Spring Boot scaffold, an Angular scaffold. Phase 1's job was narrower and more important: prove the actual RAG loop works — upload one document, ask one question, get a real answer with the source attached. No login, no UI polish, no multi-document handling. Just: does this architecture actually do the thing.

The stack