Using Apache Arrow, Calcite and Parquet to build a Relational Cache | Dremio

Sdílet
Vložit
  • čas přidán 7. 07. 2024
  • Download slides for this talk: goo.gl/eMWk8i
    Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.
    We'll start by talking about in-memory caches and the difference between block-based and data-aware caching strategies. We'll discuss the deployment design of this type of solution as well as cover the strengths of each. There will also be a discussion of the relationship of security and predicate application in these scenarios. Then we'll go into detail about how columnar storage formats can further enhance performance by minimizing read time, optimizing for vectorized in-memory processing and powerful compression techniques.
    Lastly, we'll introduce a much more advanced way to speed access to data called relational caching. Relational caching builds a cache on columnar in-memory caching techniques but also includes a full comprehension of how data is being used and how different forms of data relate to each other. This will include leveraging multiple sorting and partitioning strategies as well as maintaining multiple related derivations of data for different types of access patterns. As part of this and we also cover approaches to data ttl, relational cache consistency and several different approaches to data mutation and real-time updates.
    ABOUT DATA COUNCIL:
    Data Council (www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
    FOLLOW DATA COUNCIL:
    Twitter: / datacouncilai
    LinkedIn: / datacouncil-ai
    Facebook: / datacouncilai
    Eventbrite: www.eventbrite.com/o/data-cou...
  • Věda a technologie

Komentáře • 8

  • @allansene2406
    @allansene2406 Před 5 lety +2

    Amazing talk! Sums up everything that we learn on Introduction to Databases in 40 minutes, with a pragmatic view.

  • @MrFuckoffdipshit
    @MrFuckoffdipshit Před 6 lety +1

    This is an excellent talk. Are these slides available online? The diagram from ~21:00 is not shown. Thank you.

    • @DataCouncil
      @DataCouncil  Před 6 lety

      Yes, the slides for this talk are available at the given link in the description.

    • @allansene2406
      @allansene2406 Před 5 lety

      @@DataCouncil This page says that the slides are not available :(

    • @DataCouncil
      @DataCouncil  Před 5 lety

      @@allansene2406 Unfortunately the slides are no longer available, some speakers only give us limited use rights for their slides. Sorry about that :(

    • @badrulchowdhury2628
      @badrulchowdhury2628 Před 5 lety +3

      Slides can be found here: www.slideshare.net/dremio/using-apache-arrow-calcite-and-parquet-to-build-a-relational-cache-81440786

  • @PandemicGameplay
    @PandemicGameplay Před 3 lety +4

    A lot of talking and absolutely zero substance.