This is Part 4 of a 15-part Apache Iceberg Masterclass. Part 3 covered metadata-driven performance. This article explains how Iceberg handles the problem that has plagued data lakes for over a decade: what happens when your partition strategy needs to change.

Partitioning determines how data is physically organized in storage, and it is the single most impactful factor for query performance on large tables. Get it right and queries skip 95% of the data. Get it wrong and every query scans everything. The problem is that requirements change, data volumes grow, and the partition strategy that worked last year becomes a bottleneck this year.

Table of Contents

What Are Table Formats and Why Were They Needed?

The Metadata Structure of Current Table Formats