From Denormalization to JOINS: Why ClickHouse Can't Keep Up

Sdílet
Vložit
  • čas přidán 6. 08. 2024
  • ClickHouse has long been praised for its performance, but that performance is limited to the local maximum offered by solutions dependent on denormalization. Significant advances in JOIN technology now allow you to ditch denormalization and enjoy record-setting performance improvements in return.
    Join our data engineering expert, Sida Shen, for this insightful review of what’s new when it comes to JOINs and why now is the time to graduate from denormalization and solutions like ClickHouse.
    Highlights:
    🌟Why denormalization is required if you are using ClickHouse
    🌟What costs and challenges come with denormalization, especially in real-time analytics
    🌟How StarRocks replaces denormalization with on-the-fly JOINs
    🌟Where the technical differences are between StarRocks and ClickHouse and which is right for you
    🌟If ClickHouse is no longer cutting it, or you’re tired of being held back by denormalization, this webinar offers you a way forward.
    -----------------------------------------------------------------------------------------------------------------------
    Timestamps
    00:00 Intro
    00: 26 Agenda
    01:25 Data Modeling Best Practices - Normalization VS Denormalization
    03:41 The Cost of Denormalization
    05:58 Complex Real-Time Data Pipeline
    07:14 Introducing StarRocks
    08:24 SSB Benchmark Test - StarRocks VS. ClickHouse VS. Druid
    10:19 TPC-DS Benchmark Test - StarRocks VS. Trino
    11:23 Airbnb Case Study
    13:22 Tencent Games Case Study
    14:57 How Queries Work - From SQL Query to Result
    16:42 Query Planning
    18:39 ClickHouse Query Planner - Rule-Based Optimizer
    19:49 StarRocks Query Planning - Cost-Based Optimizer
    21:02 Data Pruning - Global Runtime filter
    23:10 Compute Architecture - How Does It Affect JOINs?
    23:22 JOIN Related Concept
    25:15 How To Execute JOINs at Scale
    27:35 Local JOINs - Collocated JOIN
    28:19 Distributed JOINs - Broadcast JOIN
    29:33 Distributed JOINs - Shuffle JOIN
    30:22 Distributed JOINs - Bucket Shuffle JOIN
    30:52 Recap: JOIN Strategies
    32:07 Compute Architecture - Scatter/Gather, Map Reduce and MPP
    34:10 StarRocks Architecture
    35:22 StarRocks vs ClickHouse
    37:10 Q & A
    37:24 How different the query optimizer, including JOIN from Spark optimizer. Was there any motivation from other optimizers while building in StarRocks?
    38:27 Why do I see ClickHouse outperform StarRocks on ClickBench when your data say otherwise?
    39:21 If the internal storage and the compute node is decoupled, doesn't it increase the network overhead? What is the recommended design?
    40:53 Can you speak to the join algorithms and strategies of each database?
    43:16 Are there any drawbacks with shuffle join?
    44:20 Where can I get the performance benchmarks?
    44:52 Is there any active development work for improving StarRocks joins and more generally the optimizer.
    -----------------------------------------------------------------------------------------------------------------------
    Learn more at celerdata.com/
    Connect with us:
    LinkedIn: / celerdata
    Twitter: / celerdata
    StarRocks GitHub: github.com/StarRocks/StarRocks
    StarRocks Website: www.starrocks.io/
    Slack: try.starrocks.com/join-starro...
    #DataAnalytics #DataEngineering #RealTimeAnalytics #RealTimeData #OLAP #DataAnalyst #DataEngineer #DataInfrastructure #UserFacingAnalytics #Database #AnalyticalDatabase #Denormalization #DataScience #ClickHouse #ApacheDruid #Trino

Komentáře • 3

  • @celerdata
    @celerdata  Před 9 měsíci +3

    Useful Links:
    🌟Join StarRocks on Slack: try.starrocks.com/join-starrocks-on-slack
    🌟[Benchmark Report] StarRocks' Queries Outperform ClickHouse, Apache Druid, and Trino: celerdata.com/blog/starrocks-queries-outperform-clickhouse-apache-druid-and-trino
    🌟[Airbnb Case Study] Airbnb Builds a New Generation of Fast Analytics Experience with StarRocks: celerdata.com/blog/airbnb-builds-a-new-generation-of-fast-analytics-experience-with-starrocks
    🌟[Tencent Games Case Study] Tencent Games' Analytics With StarRocks - czcams.com/video/VoSGq3jkY2c/video.html
    🌟ClickHouse vs. StarRocks: celerdata.com/clickhouse-alternatives-comparisons
    🌟Try CelerData Cloud for Free: celerdata.com/celerdata-cloud-free-trial
    📄 Access the Detailed Transcript: celerdata.com/blog/from-denormalization-to-joins-why-clickhouse-cannot-keep-up

  • @tratkotratkov126
    @tratkotratkov126 Před měsícem +1

    how StarRocks compares with Databricks with Photon engine ?

    • @stanshen5207
      @stanshen5207 Před 8 dny

      StarRocks is around 2x the performance compared to Databricks SQL with Photon, and it is 100% open source