Thunderbit Launches High-Fidelity Web Data API, MCP Server, and CLI
Thunderbit, an AI web data platform with over 100,000 users, today launched its developer API, Model Context Protocol (MCP) server, and CLI, giving developers new ways to turn complex, long-tail websites into clean Markdown or structured data for AI agents, RAG pipelines, and automation workflows.
At the center of the launch is Thunderbit Distill, an adaptive HTML-to-Markdown engine designed for high-fidelity conversion across complex web pages. In internal HTML-to-Markdown evaluations, Distill scored 0.87 ROUGE-L and produced cleaner, more complete Markdown across product pages, pricing tables, directories, search results, reviews, and other page types, without requiring site-specific rules.
Thunderbit uses AI models rather than fixed parsing rules to identify meaningful page content, then cleans navigation, scripts, ads, and boilerplate so LLMs and databases receive less noisy input.
Thunderbit also introduced Extract, which returns structured JSON or CSV from a URL using a developer-defined schema. Together, Distill and Extract support Markdown for AI agents, RAG, knowledge bases, and content ingestion, or structured data for databases, spreadsheets, enrichment jobs, and internal tools.












