A few months ago, I was building a small side project that needed to compare prices across a dozen e-commerce sites. Simple enough, right? I've written scrapers before. BeautifulSoup, a few clever selectors, maybe a regex or two – that's all you need.
Except it wasn't. Every site had its own HTML structure. Some loaded content via JavaScript. Others deliberately messed up class names. I spent more time updating selectors than writing actual logic. I was maintaining a fragile tower of CSS selectors that crumbled every time a developer at one of those stores decided to rename a div.
Frustrated, I started thinking: what if I stopped trying to parse the HTML structure and instead asked a human to read the page and give me the data? But I don't have a human. I have an API.
What I Tried That Didn't Work
First, the obvious: requests + BeautifulSoup. That worked for maybe 40% of sites. The rest either required JavaScript rendering (Selenium) or had such chaotic markup that my selectors kept breaking. I tried using CSS selector chaining, XPath, even scraping by position on the page. Nothing was robust.






