Last month, I spent three days wrestling with 500 PDF invoices. Each one had the same data—vendor name, invoice number, total amount—but the layouts were all over the place. Different fonts, missing headers, tables that somehow broke across pages. I tried regex. I tried OCR with layout analysis. I even tried building a rule-based parser that looked for keywords like "Total:" .

Nothing worked reliably. Every time I fixed one pattern, another invoice broke. I was one commit away from throwing my laptop out the window.

Then I took a step back. I realized I didn't need to understand every layout variation. I just needed to understand the data. And that's where AI came in.

What didn’t work

Let me be clear: I tried the usual suspects first.