Last month I inherited a project that needed to extract product information from a legacy e‑commerce site. The HTML was a nightmare—no semantic classes, inconsistent attribute names, and the occasional blob of inline JavaScript. I thought I could just write a few regular expressions and be done in an hour. Six hours later I was staring at a wall of conditional logic that broke every time the page changed.
I needed a better way, and I ended up using a large language model (LLM) to handle the fuzzy extraction. Here’s what I learned—dead ends included—and a working approach you can copy‑paste today.
The Problem
The site had product cards like this:
<div id="prod_123">






