A while back I caught myself in an interview saying "yeah, I use Parquet a lot" — and then realized I couldn't actually explain why it's faster than a CSV beyond hand-waving about "columnar." That bugged me. So I did the thing that always fixes this for me: I rebuilt it from scratch.
The result is Columna, a columnar storage engine and file format written in pure Python. No pandas, no pyarrow, no numpy. Just struct and zlib from the standard library. It's about 3,000 lines, it has 81 tests, and on a 50,000-row dataset it ends up 91% smaller than the equivalent CSV while reading 98% fewer bytes to answer a filtered query.
Live demo + file inspector: https://hajirufai.github.io/columna/
Code: https://github.com/hajirufai/columna
Here's what I learned building it.






