I spent three weeks building a web scraper for a side project that aggregated job listings from multiple startup boards. Every site had its own HTML quirks. One used <div class="job-title">, another hid titles inside <h2> tags with no class, and a third relied on JavaScript rendering that even Selenium struggled with. I’d fix one site, and a week later they’d update their markup — my script would crash. I was playing whack-a-mole with CSS selectors and regex.

I needed something that could just understand the content, not memorize its structure. That’s when I turned to large language models (LLMs).

The Breaking Point

I remember spending an entire Saturday debugging why my BeautifulSoup parser returned None for a listing that clearly existed. The site had added a random aria-label change that broke my selector chain. My code looked like this:

from bs4 import BeautifulSoup