Open table formats are specifications that define how to store and manage large datasets in a structured manner on distributed storage systems. They provide a layer of abstraction over raw data files, enabling features such as ACID transactions, schema evolution, and time travel. This abstraction allows multiple processing engines to interact with the data consistently and reliably.
The primary open table formats in use today are Apache Iceberg, Delta Lake, and Apache Hudi. Each offers unique capabilities tailored to specific use cases:
Apache Iceberg was initiated at Netflix and open sourced. It is designed for high-performance analytics and provides full support for schema and partition evolution, hidden partitioning, and time travel.
Delta Lake is developed by Databricks and emphasizes ACID transactions. It’s tightly integrated with the Spark ecosystem.
Apache Hudi is developed by Uber and is optimized for streaming data and real-time ingestion. It supports incremental data processing and efficient upsets.







