I spent 3 days scraping a site until I tried LLMs for data extraction

I’m not proud of it, but I spent three days building a scraper for an e‑commerce site that kept changing its HTML classes overnight. The first version used BeautifulSoup with CSS selectors. It worked for exactly four hours. Then the site pushed a new build, all the class names became hashed, and my carefully crafted selectors turned into wet cardboard. I patched it with regex. That held for another day until they changed the ordering of fields in the product cards. I was losing my mind.

This story isn’t about that specific site, and it’s not about any single tool. It’s about the moment I stopped trying to outsmart inconsistent markup and started treating the whole page as a blob of text that a language model could parse. It was a shift from “find the pattern” to “understand the meaning”. That changed everything.

The problem: semi‑structured web data

I needed to extract product name, price, description, and inventory status from dozens of product listing pages – some with pagination, some with infinite scroll, all with different HTML structures. The data was there, but the containers were unpredictable. One page used <div class="price">, another used <span class="final-amount">, and a third had the price inside a <meta> tag. Writing a universal parser was like playing Whac‑A‑Mole.

I spent 3 days scraping a site until I tried LLMs for data extraction

Related reading

I stopped fighting broken parsers — here's how I use LLMs to extract web data…

I Gave Up on CSS Selectors: Using LLMs for Web Scraping

I Spent 3 Days Scraping a Site — Then AI Did It in 10 Minutes

I spent 3 days writing regexes. Then I asked an AI to do it in 10 minutes.

I Tried AI-Powered Web Scraping So My Selectors Could Finally Rest

Why I Gave Up on Perfect Selectors and Asked GPT to Extract My Data