Dipankar Mazumdar is the Director of Developer Relations at Cloudera, leading global developer initiatives across lakehouse architecture and AI. He previously held advocacy and engineering roles at Dremio, Onehouse, and Qlik, contributing to open source projects including Apache Iceberg, Apache Hudi, and Apache XTable (incubating) and building communities. His work focuses on the intersection of data engineering and AI. He is the author of Engineering Lakehouses with Open Table Formats and a contributor to Apache Iceberg: The Definitive Guide.
Apache Iceberg has quickly become a foundational technology in modern data architectures—but its impact goes far beyond performance and scale. This conversation with Dipankar explores how Iceberg redefined the data lake, and how community, education, and open collaboration fueled its adoption.
What Is Apache Iceberg and Why It Exists
Q: Can you tell us a bit about Apache Iceberg?
A: Apache Iceberg is a high-performance open table format for huge analytic datasets. It was designed to bring reliability and simplicity to data lakes, allowing multiple engines to safely read and write to the same datasets with strong guarantees. By introducing a table abstraction on top of raw data files, Iceberg helps organizations manage large-scale data with the consistency typically associated with data warehouses while retaining the flexibility of data lakes.











