Your Scraper Died at Row 12,000. The Rerun Pattern.

My scraper died at row 12,000 of 50,000, three hours in. The crash itself was cheap. A process gets OOM-killed, a quota trips, a machine reboots, it happens. The expensive part came next: I re-ran it. From zero. And paid, in time and in requests, for the 11,999 rows I already had sitting on disk.

That second bill is the one nobody writes code for. This post is the code. It's about 40 lines of stdlib Python that let a crashed job pick up where it died, fetching only the missing rows and writing zero duplicates, plus the real captured output of a run that crashes and a rerun that finishes it cleanly.

To be clear about scope: this is the run after the crash — how to restart a long job so it finishes the work it lost without re-fetching what it already pulled and without writing a row twice. It is not retry/backoff inside a single request (that's a different post of mine), not schema-drift detection (the post where I said "a crash loses the run" — this is the part where you get the run back), not a budget kill-switch that stops a runaway, and explicitly not conditional-GET / ETag / "skip unchanged pages" — that's freshness, a separate question entirely. Just: your job died mid-way, the clock and the bill are still running, how do you resume cheap and clean.

Your Scraper Died at Row 12,000. The Rerun Pattern.

Related reading

Your Scraper Collected 50 Rows. There Were 4,000.

Your scraper isn't broken. The site changed, and it didn't tell you.

Scraping millions of pages a day: what actually breaks

Scraping platform costs: measure successful rows, not browser minutes

Replayable Runs > Faster Runs. Stop Optimising for the Wrong Number.

Your recurring scraper is re-downloading data that didn't change. Here's the…