Building a structured dataset from the web is still a pipeline problem. You identify a data source, write or configure a scraper, design a schema, handle deduplication, schedule refreshes, and fix breakage when upstream sites change. That process stays roughly the same whether you do it once or a hundred times.

TinyFish is releasing BigSet to address that workflow directly. Bigset is an open-source multi-agent system licensed under AGPL-3.0. It takes a natural-language description as input and returns a structured, exportable dataset built from live web data. The full codebase is available on GitHub.

What is BigSet

Bigset positions itself as the layer between a data requirement and a usable table. You describe what you want in a sentence. The system infers the schema, dispatches agents to gather data, deduplicates results, and produces a downloadable CSV or XLSX file.

A practical example: you type “YC companies that are currently hiring engineers, with their funding stage, location, and number of open roles.” Bigset infers what columns that implies, finds the relevant entities on the web, and fills in the rows. You don’t specify a URL. You don’t configure selectors. You describe the data.