Building on open table formats

Open table formats are specifications that define how to store and manage large datasets in a structured manner on distributed storage systems. They provide a layer of abstraction over raw data files, enabling features such as ACID transactions, schema evolution, and time travel. This abstraction allows multiple processing engines to interact with the data consistently and reliably.

The primary open table formats in use today are Apache Iceberg, Delta Lake, and Apache Hudi. Each offers unique capabilities tailored to specific use cases:

Apache Iceberg was initiated at Netflix and open sourced. It is designed for high-performance analytics and provides full support for schema and partition evolution, hidden partitioning, and time travel.

Delta Lake is developed by Databricks and emphasizes ACID transactions. It’s tightly integrated with the Spark ecosystem.

Apache Hudi is developed by Uber and is optimized for streaming data and real-time ingestion. It supports incremental data processing and efficient upsets.

The primary open table formats in use today are Apache Iceberg, Delta Lake, and Apache Hudi. Each offers unique capabilities tailored to specific use cases:

Delta Lake is developed by Databricks and emphasizes ACID transactions. It’s tightly integrated with the Spark ecosystem.

Apache Hudi is developed by Uber and is optimized for streaming data and real-time ingestion. It supports incremental data processing and efficient upsets.

Building on open table formats

Building on open table formats

Related reading

Open Data Standards: Postgres, OTel, and Iceberg

Apache Iceberg in Production: Compaction, Catalogs, and the Pitfalls Nobody…

Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup

Apache Iceberg v4: The Current State, the Proposals, and Why They Matter

Apache Iceberg interoperability reaches tipping point - SiliconANGLE

Hands-On with Apache Iceberg Using Dremio Cloud

Related reading

Open Data Standards: Postgres, OTel, and Iceberg

Apache Iceberg in Production: Compaction, Catalogs, and the Pitfalls Nobody…

Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup

Apache Iceberg v4: The Current State, the Proposals, and Why They Matter

Apache Iceberg interoperability reaches tipping point - SiliconANGLE

Hands-On with Apache Iceberg Using Dremio Cloud