Your Scraper Collected 50 Rows. There Were 4,000.

A scraper can pass every check you wrote and still be wrong about the one thing you actually care about: how much it collected.

No exception. No 500. No broken row. Exit code 0, logs green, every field valid. And the set on disk is a quarter of what the site actually has. I have run scrapers in production enough times to stop trusting a green run on its own, and this is the failure that taught me to count.

TL;DR

A paginated source can serve fewer rows than it claims and never throw — page caps, hidden offset limits, infinite scroll that "ends" early.

Your status check (200), schema check (valid row), and byte check (you got data) all pass. None of them counts records.

Your Scraper Collected 50 Rows. There Were 4,000.

Related reading

Your Scraper Returned a Clean Row. It Was Wrong.

Your Scraper Died at Row 12,000. The Rerun Pattern.

Your scraper isn't broken. The site changed, and it didn't tell you.

Scraping millions of pages a day: what actually breaks

Your recurring scraper is re-downloading data that didn't change. Here's the…

The Anti-Bot Detection Checklist I Use Before Every Scraping Project