If your Kafka Docker Compose still has a ZooKeeper service in it, your setup is already legacy. As of Kafka 4.0 (released March 2025), ZooKeeper is gone. The architecture changed, the config changed, and the setup you learned two years ago will not work with any Kafka 4.x image.

This guide covers Kafka from the ground up — what it is, how it works, how to run it locally in 2026 with KRaft, and the Python patterns that hold up in production. It also covers the gotchas that waste days when you're new to it.

What Kafka Is and Why Data Engineers Use It

Kafka is a distributed event streaming platform. The core abstraction is a log: an ordered, append-only sequence of records. Producers write to the log. Consumers read from it. The log is retained for a configurable period (default 7 days), so multiple consumers can independently read the same data at their own pace.

For data engineers, this matters in three scenarios: