Friday, March 1, 2024

Introduction to Apache Iceberg. Exploring Apache Iceberg… | by Pier Paolo Ippolito | Feb, 2024

Must read

Due to the arrival of Knowledge Lakes simply accessible by cloud suppliers equivalent to GCP, Azure, and AWS, it has been potential for increasingly organizations to cheaply retailer their unstructured knowledge. Though Knowledge Lakes include many limitations equivalent to:

  • Inconsistent reads can occur when mixing batch and streaming or appending new knowledge.
  • High-quality-grained modification of current knowledge can develop into complicated (e.g. to satisfy GDPR necessities)
  • Efficiency degradation when dealing with tens of millions of small information.
  • No ACID (Atomicity, Consistency, Isolation, Sturdiness) transaction assist.
  • No schema enforcement/evolution.

To attempt to alleviate these points, Apache Iceberg was ideated by Nextflix in 2017. Apache Iceberg is a desk format capable of present a further layer of abstraction to assist ACID transactions, time journey, and so forth.. whereas working with numerous sorts of knowledge sources and workloads. The primary goal of a desk format is to outline a protocol on how you can greatest handle and set up all of the information composing a desk. Other than Apache Iceberg, different at present fashionable open desk codecs are Hudi and Delta Lake.

For instance, Apache Iceberg and Delta Lake principally have the identical traits though for instance, Iceberg can assist additionally different file codecs like ORC and Avro. Delta Lake however is at present closely supported by Databricks and the open-source group and capable of present a larger number of APIs (Determine 1).

Determine 1: Apache Iceberg vs Delta Lake (Picture by Creator).

All through the years, Apache Iceberg has been open-sourced by Nexflix and lots of different firms equivalent to SnowFlake and Dremio have determined to put money into the venture.

Every Apache Iceberg desk follows a 3 layers structure:

  • Iceberg Catalog
  • Metadata Layer (with metadata information, manifest lists, and manifest information)

Supply hyperlink

More articles


Please enter your comment!
Please enter your name here

Latest article