It Depends #64: Iceberg, Delta, Hudi, Polaris, Unity & Lakehouse - Vinoth Chandar Onehouse - Jun ’24
Vložit
- čas přidán 25. 06. 2024
- On the occasion of Onehouse’s $35M Series B funding, I had the pleasure of discussing the intricacies of lakehouse architecture and table formats with founder and CEO, Vinoth Chandar. He created the Hudi format while at Uber and then worked at Confluent, before starting Onehouse. Our conversation delved into the origins and differences between Apache Hudi, Apache Iceberg, and Delta Lake.
The complexity behind table formats and metadata is truly remarkable. Vinoth shared insights into achieving interoperability through Databricks UniForm and @Apache XTable, developed in collaboration with Microsoft and Google. We also explored technical aspects of meta stores and catalogs, such as Snowflake’s Polaris and Databricks’ Unity Catalog.
Vinoth also offered his perspective on Tabular (now part of Databricks) and the future of lakehouse management. I believe you'll find this episode insightful and valuable.
Feel free to share your thoughts and feedback
#data #datamanagement #ai #ml #cloud #multicloud #moderndatastack #cloudnative #opensource #llm #lakehouse #apacheiceberg #apachehudi #LFDelta - Věda a technologie
transactional layer
iceberg - update is delete +insert
iceberg - mark the del records , create file of new inserts
hudi - mor
hudi - streaming workloads
liquid clustering
table metadata formats
hudi, delta - more writes
iceberg - more read - warehouse
read, write, table management
read optimised
write optimised
hudi
lakehouse
data rewriting
3 tables formats
table metadata formats
iceberg - snapshot
delta, hudi - log bassd
write, read, table management
hudi - built in table mgmt
metadata
cor ; mow
cow; mor
interoperability
uniform
read compatibility
write compatibility
manifest file
delta log
xtable and uniform
foundational db in sf
access control + technical metadata
polaris - sf
unity catalog - open source
iceberg rest api catalog
metastore - read and write plans
data not tied to one engine
data catalog being interoperable bw engines
lake view - ingestion, transformation, orchestration- observability