I stopped fighting broken parsers — here's how I use LLMs to extract web data reliably

A few months ago, I was building a price tracker for limited-edition sneakers. I had a list of 50+ store URLs, and I needed to extract product name, price, availability, and size options. Classic scraping, right?

I started with CSS selectors. BeautifulSoup + requests. It worked for about a week. Then one site changed their class names. Another added a dynamic loader. A third injected ads that shifted the DOM. I spent more time fixing selectors than actually using the data.

I tried regex on the raw HTML. That was a disaster — fragile and unreadable. I tried headless browsers with Playwright, waiting for specific elements. Still broke when the layout changed.

The problem was fundamental: I was trying to reverse-engineer the presentation layer. But what I really wanted was the meaning of the content — the product's price, not the CSS class it lived in.

The turning point: LLMs for structured extraction

I stopped fighting broken parsers — here's how I use LLMs to extract web data reliably

Related reading

I Gave Up on CSS Selectors: Using LLMs for Web Scraping

I spent 3 days scraping a site until I tried LLMs for data extraction

When Traditional Web Scraping Fails: A Practical AI Approach

When Regex Fails: Using LLMs to Extract Structured Data from Messy Pages

I Tried AI-Powered Web Scraping So My Selectors Could Finally Rest

Why I Gave Up on Perfect Selectors and Asked GPT to Extract My Data