Use a catalog-linked database for Apache Iceberg
tables”
Iceberg is like a smart filing system for big data
tables.
It keeps track of all your files, versions, and snapshots so querying data is
fast and reliable.
🗂️ What’s a Catalog
in Iceberg?
Think of the catalog as the master notebook
where Iceberg writes:
- Where
your tables are stored
- What
files belong to each table
- The
schema
- The
table versions
- The
metadata
Examples of catalogs: AWS Glue, Hive Metastore,
Nessie, REST Catalog, Snowflake, etc.
🏷️ What is a catalog-linked
database?
Imagine you want to organize your toys in boxes.
You don’t write directly on the toy box; instead, you write in a notebook:
- Box
1 → Cars
- Box
2 → Legos
- Box
3 → Action figures
In Iceberg, the catalog-linked database is this
organized grouping inside the catalog.
It means:
Your database in Spark or Flink is connected to a
catalog. All tables you create inside that database automatically become
Iceberg tables managed by that catalog.
Think of it like this:
- The catalog
= a big library system.
- A catalog-linked
database = a section in that library, like “Kids Books”.
- Iceberg
tables = the actual books.
When you create a table in that database (section):
📘 → It is automatically
registered in the catalog (library system)
📚
→ Iceberg manages how the data files are stored
🗂️
→ Everything stays organized
So instead of you manually telling Iceberg where every book
is,
the catalog-linked database takes care of that automatically.
🧐 Why do people use
catalog-linked databases?
Because:
- You don’t
have to specify catalog settings every time
- All
tables in that database are Iceberg tables by default
- Easier
to organize tables
- Cleaner
project structure
- Less
code and fewer mistakes
No comments:
Post a Comment