I have a confession: I once spent three full days writing regular expressions to parse doctor’s appointment emails from different providers. By the end, I had a 400-line monstrosity that worked for exactly two email formats. When a third clinic joined the system, I knew it was time for a different approach.

The Problem: Unstructured Text Everywhere

I was building a small integration that needed to extract structured data—dates, times, names, and addresses—from plain text messages. The sources were diverse: emails, Slack messages, even scanned PDF notes. Each had its own quirks. Regex was brittle. BeautifulSoup couldn’t help when there was no HTML. I tried custom NLP pipelines with spaCy, but training new entities for every field was overkill.

My team’s internal tool was on the verge of shipping, but every new text source meant another round of debugging regex patterns.

What Didn't Work