HTTP 200 Is a Lie: A 30-Line Schema Canary for Source Drift

A scraper that returns HTTP 200 is not a scraper that returns good data. Those are two different claims, and almost every monitoring setup I've seen conflates them.

Here's the failure mode nobody writes code for. The source you scrape quietly changes. A field gets renamed, a number comes back as a string, one column goes blank. Your request still gets a 200. Your parser doesn't throw. Your job exits green. And from that day forward, every scheduled run feeds slightly-wrong records into your corpus. No alarm. No stack trace. Just slow rot.

I've got 2,190 production scraper runs behind me, 962 of them on a single Trustpilot review scraper (that's a raw lifetime run counter on my Apify profile, knotless_cadence, as of May 2026; not a controlled study, just a long-running meter). And the thing I learned running one source that many times isn't about proxies or rate limits. It's this: the most expensive failures are the ones that don't fail.

This post is about catching them cheap. There's a 30-line stdlib validator at the bottom. Copy it, run it, drop it into your pipeline today.

TL;DR

A scraper that returns HTTP 200 is not a scraper that returns good data. Those are two different claims, and almost every monitoring setup I've seen conflates them.

This post is about catching them cheap. There's a 30-line stdlib validator at the bottom. Copy it, run it, drop it into your pipeline today.

TL;DR

HTTP 200 Is a Lie: A 30-Line Schema Canary for Source Drift

HTTP 200 Is a Lie: A 30-Line Schema Canary for Source Drift

Related reading

A 200 Is Not a Read

Your recurring scraper is re-downloading data that didn't change. Here's the…

Your Scraper Collected 50 Rows. There Were 4,000.

Your Scraper Returned a Clean Row. It Was Wrong.

HTTP 200 Is Not Enough: Define a Successful AI API Request

Catch LLM Schema Drift Before It Breaks Production

Related reading

A 200 Is Not a Read

Your recurring scraper is re-downloading data that didn't change. Here's the…

Your Scraper Collected 50 Rows. There Were 4,000.

Your Scraper Returned a Clean Row. It Was Wrong.

HTTP 200 Is Not Enough: Define a Successful AI API Request

Catch LLM Schema Drift Before It Breaks Production