I Gave Up on CSS Selectors: Using LLMs for Web Scraping

A few months ago, I was building a small side project that needed to compare prices across a dozen e-commerce sites. Simple enough, right? I've written scrapers before. BeautifulSoup, a few clever selectors, maybe a regex or two – that's all you need.

Except it wasn't. Every site had its own HTML structure. Some loaded content via JavaScript. Others deliberately messed up class names. I spent more time updating selectors than writing actual logic. I was maintaining a fragile tower of CSS selectors that crumbled every time a developer at one of those stores decided to rename a div.

Frustrated, I started thinking: what if I stopped trying to parse the HTML structure and instead asked a human to read the page and give me the data? But I don't have a human. I have an API.

What I Tried That Didn't Work

First, the obvious: requests + BeautifulSoup. That worked for maybe 40% of sites. The rest either required JavaScript rendering (Selenium) or had such chaotic markup that my selectors kept breaking. I tried using CSS selector chaining, XPath, even scraping by position on the page. Nothing was robust.

I Gave Up on CSS Selectors: Using LLMs for Web Scraping

Related reading

I stopped fighting broken parsers — here's how I use LLMs to extract web data…

Why My CSS Selectors Kept Breaking (and How LLMs Fixed It)

I Tried AI-Powered Web Scraping So My Selectors Could Finally Rest

Why I Gave Up on Perfect Selectors and Asked GPT to Extract My Data

I spent 3 days scraping a site until I tried LLMs for data extraction

When Regex Fails: Using LLMs to Extract Structured Data from Messy Pages