From Denormalization to JOINS: Why ClickHouse Can't Keep Up
Vložit
- čas přidán 6. 08. 2024
- ClickHouse has long been praised for its performance, but that performance is limited to the local maximum offered by solutions dependent on denormalization. Significant advances in JOIN technology now allow you to ditch denormalization and enjoy record-setting performance improvements in return.
Join our data engineering expert, Sida Shen, for this insightful review of what’s new when it comes to JOINs and why now is the time to graduate from denormalization and solutions like ClickHouse.
Highlights:
🌟Why denormalization is required if you are using ClickHouse
🌟What costs and challenges come with denormalization, especially in real-time analytics
🌟How StarRocks replaces denormalization with on-the-fly JOINs
🌟Where the technical differences are between StarRocks and ClickHouse and which is right for you
🌟If ClickHouse is no longer cutting it, or you’re tired of being held back by denormalization, this webinar offers you a way forward.
-----------------------------------------------------------------------------------------------------------------------
Timestamps
00:00 Intro
00: 26 Agenda
01:25 Data Modeling Best Practices - Normalization VS Denormalization
03:41 The Cost of Denormalization
05:58 Complex Real-Time Data Pipeline
07:14 Introducing StarRocks
08:24 SSB Benchmark Test - StarRocks VS. ClickHouse VS. Druid
10:19 TPC-DS Benchmark Test - StarRocks VS. Trino
11:23 Airbnb Case Study
13:22 Tencent Games Case Study
14:57 How Queries Work - From SQL Query to Result
16:42 Query Planning
18:39 ClickHouse Query Planner - Rule-Based Optimizer
19:49 StarRocks Query Planning - Cost-Based Optimizer
21:02 Data Pruning - Global Runtime filter
23:10 Compute Architecture - How Does It Affect JOINs?
23:22 JOIN Related Concept
25:15 How To Execute JOINs at Scale
27:35 Local JOINs - Collocated JOIN
28:19 Distributed JOINs - Broadcast JOIN
29:33 Distributed JOINs - Shuffle JOIN
30:22 Distributed JOINs - Bucket Shuffle JOIN
30:52 Recap: JOIN Strategies
32:07 Compute Architecture - Scatter/Gather, Map Reduce and MPP
34:10 StarRocks Architecture
35:22 StarRocks vs ClickHouse
37:10 Q & A
37:24 How different the query optimizer, including JOIN from Spark optimizer. Was there any motivation from other optimizers while building in StarRocks?
38:27 Why do I see ClickHouse outperform StarRocks on ClickBench when your data say otherwise?
39:21 If the internal storage and the compute node is decoupled, doesn't it increase the network overhead? What is the recommended design?
40:53 Can you speak to the join algorithms and strategies of each database?
43:16 Are there any drawbacks with shuffle join?
44:20 Where can I get the performance benchmarks?
44:52 Is there any active development work for improving StarRocks joins and more generally the optimizer.
-----------------------------------------------------------------------------------------------------------------------
Learn more at celerdata.com/
Connect with us:
LinkedIn: / celerdata
Twitter: / celerdata
StarRocks GitHub: github.com/StarRocks/StarRocks
StarRocks Website: www.starrocks.io/
Slack: try.starrocks.com/join-starro...
#DataAnalytics #DataEngineering #RealTimeAnalytics #RealTimeData #OLAP #DataAnalyst #DataEngineer #DataInfrastructure #UserFacingAnalytics #Database #AnalyticalDatabase #Denormalization #DataScience #ClickHouse #ApacheDruid #Trino
Useful Links:
🌟Join StarRocks on Slack: try.starrocks.com/join-starrocks-on-slack
🌟[Benchmark Report] StarRocks' Queries Outperform ClickHouse, Apache Druid, and Trino: celerdata.com/blog/starrocks-queries-outperform-clickhouse-apache-druid-and-trino
🌟[Airbnb Case Study] Airbnb Builds a New Generation of Fast Analytics Experience with StarRocks: celerdata.com/blog/airbnb-builds-a-new-generation-of-fast-analytics-experience-with-starrocks
🌟[Tencent Games Case Study] Tencent Games' Analytics With StarRocks - czcams.com/video/VoSGq3jkY2c/video.html
🌟ClickHouse vs. StarRocks: celerdata.com/clickhouse-alternatives-comparisons
🌟Try CelerData Cloud for Free: celerdata.com/celerdata-cloud-free-trial
📄 Access the Detailed Transcript: celerdata.com/blog/from-denormalization-to-joins-why-clickhouse-cannot-keep-up
how StarRocks compares with Databricks with Photon engine ?
StarRocks is around 2x the performance compared to Databricks SQL with Photon, and it is 100% open source