I rebuilt my Financial Mentor retrieval from scratch. Here's everything the RAG stack taught me

From stuffing JSON into Claude to GraphRAG, hybrid search, CRAG, and adversarial evaluation — the complete honest account

The problem with FinMentor started before I had the vocabulary to describe it...

Users were asking reasonable questions about their portfolios. The system was answering them. Some answers were right. Some answers were wrong. And I couldn't explain the pattern because I hadn't looked at what was actually flowing into the model.

When I looked: every query was receiving the full IBKR portfolio snapshot. JSON format. Five positions, monthly P&L, thirty transactions, account metadata. The same 847 tokens regardless of what was asked. A question about sector concentration got the full transaction history. A question about a single ticker got every other position. Maybe 10% of the context was relevant to any given question. The other 90% was noise competing for attention and billing me for the privilege.

I wasn't doing retrieval. I was doing copy-paste with extra steps.

From stuffing JSON into Claude to GraphRAG, hybrid search, CRAG, and adversarial evaluation — the complete honest account

The problem with FinMentor started before I had the vocabulary to describe it...

I wasn't doing retrieval. I was doing copy-paste with extra steps.

I rebuilt my Financial Mentor retrieval from scratch. Here's everything the RAG stack taught me

I rebuilt my Financial Mentor retrieval from scratch. Here's everything the RAG stack taught me

Related reading

From 10% to 57% Accuracy on FinanceBench: What Actually Moved the Needle

Why RAG gives wrong answers (and how to fix retrieval failures)

5 Failure Modes I Found in My Financial RAG (And the One That Actually Mattered)

Why my first RAG system hallucinated (and how I fixed it)

MCP + RAG: Why I Stopped Building Complex RAG Systems After MCP Changed…

I Built RAG From Scratch in Python to Understand It. Here's What I Learned.

Related reading

From 10% to 57% Accuracy on FinanceBench: What Actually Moved the Needle

Why RAG gives wrong answers (and how to fix retrieval failures)

5 Failure Modes I Found in My Financial RAG (And the One That Actually Mattered)

Why my first RAG system hallucinated (and how I fixed it)

MCP + RAG: Why I Stopped Building Complex RAG Systems After MCP Changed…

I Built RAG From Scratch in Python to Understand It. Here's What I Learned.