Apache Iceberg looked like the answer to everything when we first adopted it. Open format, ACID transactions, time travel, schema evolution. We migrated our Hive tables, ran a few queries, and felt good about life.

Three months later, our S3 costs doubled. Queries that used to take 10 seconds were taking 4 minutes. Metadata operations were timing out. Nobody on the team could explain why.

That was the beginning of a real education in how Iceberg actually behaves in production. This post covers what I wish someone had told us before we went all-in.

The Small Files Problem Is Not Optional

Iceberg is append-friendly by design. Every micro-batch write, every streaming insert, every incremental load creates new Parquet files. Each file also gets its own metadata entry.