A few months ago a friend showed me two tools — Repomix and code2prompt. The idea was simple: point them at your project folder, they package everything into one file, you paste it into an LLM and ask questions about your whole codebase at once. For his pure Python projects they worked great.
I was working on a data analytics project at the time — dimension and fact CSVs, a SQL dump, some Power BI files, Jupyter notebooks with ML models. I ran Repomix on it and got a 22,085 KB output file. code2prompt gave me 9,304 KB. I tried pasting either of them into Claude. It choked immediately.
So I opened the files to see what was actually inside them. What I found was the root of the problem.
What These Tools Get Wrong for Data Projects
Repomix and code2prompt are built for code repos. They operate on a simple principle: read every file, dump every file. That works fine when your project is Python scripts and config files. It completely falls apart when your project looks like mine.






