I’m not proud of it, but I spent three days building a scraper for an e‑commerce site that kept changing its HTML classes overnight. The first version used BeautifulSoup with CSS selectors. It worked for exactly four hours. Then the site pushed a new build, all the class names became hashed, and my carefully crafted selectors turned into wet cardboard. I patched it with regex. That held for another day until they changed the ordering of fields in the product cards. I was losing my mind.
This story isn’t about that specific site, and it’s not about any single tool. It’s about the moment I stopped trying to outsmart inconsistent markup and started treating the whole page as a blob of text that a language model could parse. It was a shift from “find the pattern” to “understand the meaning”. That changed everything.
The problem: semi‑structured web data
I needed to extract product name, price, description, and inventory status from dozens of product listing pages – some with pagination, some with infinite scroll, all with different HTML structures. The data was there, but the containers were unpredictable. One page used <div class="price">, another used <span class="final-amount">, and a third had the price inside a <meta> tag. Writing a universal parser was like playing Whac‑A‑Mole.






