You have been using python and you have written scripts that pull data, clean it and load it somewhere, May be you even have a while True loop running on the server somewhere or a cron job you are afraid to touch.

This tutorial is for you

Apache airflow is the tool that takes your python script from "runs on my machine" to "runs reliably at 6 am every day, retries on failure, sends an alert if something breaks and has UI to see what exactly happened.It is the standard orchestration tool at most data engineering teams and more approachable as it looks.

By the end of this post you will understand how airflow thinks and you will have a working DAG that schedules real data pipeline.

What Airflow actually is (and isn't)?