Most Python tutorials teach you to write Python. This one teaches you to write pipelines.

The code is different. A pipeline has to be correct at 7am when no one is watching, recover gracefully from a flaky API, not destroy existing data on a re-run, and leave enough logs that you can diagnose a failure three days later. The language is the same — the patterns are not.

I've built pipelines processing anywhere from a few thousand rows to 1.5 million. The stack is always some combination of requests, pandas, SQLAlchemy, dbt, pytest, and Airflow. Here are the patterns that show up in every one.

Project Structure

Every pipeline project I start looks like this: