TL;DRAI

LLM zero-shot extraction replaces brittle CSS selectors with semantic mapping, resisting UI changes. Engineering teams eliminate linear selector maintenance and scale toward schema-driven pipelines, concentrating effort on core infrastructure.

TL;DR

Zero-shot JSON extraction replaces brittle CSS selectors with Large Language Models that map unstructured web content to predefined schemas semantically. By processing cleaned HTML or Markdown through an LLM context window, scraping pipelines become resilient to UI changes, A/B tests, and dynamic class names. This approach shifts data engineering effort from constant selector maintenance to high-level schema definition, enabling truly agentic data collection.

The Selector Maintenance Trap

Web scraping pipelines eventually hit the same bottleneck: selector maintenance. Traditional data extraction relies on identifying structural patterns in the Document Object Model (DOM). You write rules targeting specific HTML nodes using tools like XPath, BeautifulSoup, or Cheerio.

A standard selector might look like div.product-details > span:nth-child(3) > b.price-tag.

dev.to

Replacing Fragile CSS Selectors with LLM-Powered Zero-Shot JSON Extraction

TL;DR Zero-shot JSON extraction replaces brittle CSS selectors with Large Language Models...

domenica 14 giugno 2026 New tab

TL;DRAI

1,779 words~8 min read

TL;DR

The Selector Maintenance Trap

A standard selector might look like div.product-details > span:nth-child(3) > b.price-tag.

Replacing Fragile CSS Selectors with LLM-Powered Zero-Shot JSON Extraction

Replacing Fragile CSS Selectors with LLM-Powered Zero-Shot JSON Extraction

Related reading

Why My CSS Selectors Kept Breaking (and How LLMs Fixed It)

Reduce LLM Token Waste in RAG with Markdown

I Gave Up on CSS Selectors: Using LLMs for Web Scraping

DOM Accessibility Tree Extraction: A Reliable Method for LLMs on Dynamic Web…

I stopped fighting broken parsers — here's how I use LLMs to extract web data…

When Regex Fails: Using LLMs to Extract Structured Data from Messy Pages

Related reading

Why My CSS Selectors Kept Breaking (and How LLMs Fixed It)

Reduce LLM Token Waste in RAG with Markdown

I Gave Up on CSS Selectors: Using LLMs for Web Scraping

DOM Accessibility Tree Extraction: A Reliable Method for LLMs on Dynamic Web…

I stopped fighting broken parsers — here's how I use LLMs to extract web data…

When Regex Fails: Using LLMs to Extract Structured Data from Messy Pages