Kafka for Data Engineers: Core Concepts, KRaft, and the Patterns That Actually Work

If your Kafka Docker Compose still has a ZooKeeper service in it, your setup is already legacy. As of Kafka 4.0 (released March 2025), ZooKeeper is gone. The architecture changed, the config changed, and the setup you learned two years ago will not work with any Kafka 4.x image.

This guide covers Kafka from the ground up — what it is, how it works, how to run it locally in 2026 with KRaft, and the Python patterns that hold up in production. It also covers the gotchas that waste days when you're new to it.

What Kafka Is and Why Data Engineers Use It

Kafka is a distributed event streaming platform. The core abstraction is a log: an ordered, append-only sequence of records. Producers write to the log. Consumers read from it. The log is retained for a configurable period (default 7 days), so multiple consumers can independently read the same data at their own pace.

For data engineers, this matters in three scenarios:

What Kafka Is and Why Data Engineers Use It

For data engineers, this matters in three scenarios:

Kafka for Data Engineers: Core Concepts, KRaft, and the Patterns That Actually Work

Kafka for Data Engineers: Core Concepts, KRaft, and the Patterns That Actually Work

Related reading

Apache Kafka Explained: A Practical Beginner Guide for Data Engineers

Kafka without ZooKeeper: My Strimzi HA Playbook on K8s

Apache Kafka End of Life: Kafka Versions EOL Every 4 Months — Are You Behind?

Great Stack to Doesn't Work #2 — Kafka: "Where Did My Messages Go?"

Apache Kafka for Beginners: Building Real-Time Streaming Systems with Python

Shifting from Databases to Kafka: How to Build an Indestructible Data Pipeline

Related reading

Apache Kafka Explained: A Practical Beginner Guide for Data Engineers

Kafka without ZooKeeper: My Strimzi HA Playbook on K8s

Apache Kafka End of Life: Kafka Versions EOL Every 4 Months — Are You Behind?

Great Stack to Doesn't Work #2 — Kafka: "Where Did My Messages Go?"

Apache Kafka for Beginners: Building Real-Time Streaming Systems with Python

Shifting from Databases to Kafka: How to Build an Indestructible Data Pipeline