Email signatures are the most valuable dataset your CRM is throwing away. Roughly 82% of business email carries a signature with at least a name and title — and usually a phone number, a LinkedIn URL, a company name, sometimes a whole org-chart hint. That's structured data masquerading as prose, delivered free with every message, and most platforms scroll right past it.

You don't need a data vendor or even an LLM to harvest it. A few hundred lines of regex, a cross-referencing trick, and a dedicated inbox for the agent doing the work gets you to production-usable accuracy. Here's the build.

Regex beats the LLM here (really)

For genuinely unstructured prose, a model wins. Signatures aren't unstructured — they're predictably structured: 3–6 lines, often separated from the body by the RFC 3676 -- delimiter, drawing from a small set of field types. A regex pass catches over 95% of well-formed signatures, runs in microseconds, and costs nothing per message. Keep the LLM as a fallback for the weird 5%, and skip it entirely in version one.

Find the boundary first: