Data Lakehouse: An Introduction

Sdílet
Vložit
  • čas přidán 30. 06. 2024
  • What is the Data Lakehouse and why is it so important? In this first video of a series, we look at how the Data Lakehouse compares to the traditional SQL data warehouse, where is meets or exceeds that functionality and where it is still lacking.
    Join my Patreon Community and Watch this Video without Ads!
    www.patreon.com/bePatron?u=63...
    Example Slides & Notebook at:
    github.com/bcafferky/shared/b...
    You need to unzip the file and import the notebook into Databricks to run the code.
    See my complete Databricks training series at:
    • Master Databricks and ...
  • Věda a technologie

Komentáře • 39

  • @ayandapeter1681
    @ayandapeter1681 Před 23 dny +1

    Sir, I just want to say thank you so much, I've gone through many videos but was still confused, u made this crystal clear with all your conceptual approach.

    • @BryanCafferky
      @BryanCafferky  Před 22 dny

      Thank you for kind words. I'm so glad my videos are helping you. That's why I do them. I know this technology is not easy to learn so kudos to you for sticking with it.

  • @HasanCatalgol
    @HasanCatalgol Před dnem

    Underrated channel, really quality information.

  • @brokejohnnylive1530
    @brokejohnnylive1530 Před 7 dny

    Dude you are on the money!! Agree all 100%.

  • @DenisGorev-xj5hl
    @DenisGorev-xj5hl Před rokem +3

    It is amazing how concisely you put so much information in one video! Great!

  • @sujithravindran7082
    @sujithravindran7082 Před 10 měsíci +2

    I really enjoyed the perspective you brought into the evolution. Great work. Please keep bringing in these great videos. Thank you very much.

  • @janni9789
    @janni9789 Před rokem +1

    Again, perfectly explained. Thank you

  • @wennie2939
    @wennie2939 Před rokem +1

    Best video on this topic ever!

  • @WeAreTeamNovus
    @WeAreTeamNovus Před rokem

    Amazing stuff, as always!

  • @GILLOS21
    @GILLOS21 Před rokem

    Amazing lecture! Thank you!

  • @jayashreetheagarajan2708

    Amazing contents.. Thank you Bryan

  • @gardnmi
    @gardnmi Před rokem +4

    I'd love to see a non-bias comparison between delta lake, hudi, and iceberg.

    • @BryanCafferky
      @BryanCafferky  Před rokem +2

      So would I. lol. Iceberg seems to be Snowflake's version of Lakehouse. Not sure about hudi.

    • @BryanCafferky
      @BryanCafferky  Před rokem +1

      Looks like Amazon is promoting hudi.

  • @kamalesht5942
    @kamalesht5942 Před rokem

    Your videos are really helping me improve the core knowledge on Data Engineering concepts. Thankyou!

  • @ChristianWDegn
    @ChristianWDegn Před rokem

    Good presentation Thank!

  • @maheshthati1320
    @maheshthati1320 Před 6 měsíci

    Best explanation

  • @stu8924
    @stu8924 Před rokem

    Thank you Bryan.

  • @BhaveshKumar-dz8hq
    @BhaveshKumar-dz8hq Před 4 měsíci

    you are a hidden gem

  • @avishaysebban1515
    @avishaysebban1515 Před rokem

    you're the best thank you.

  • @rich111296
    @rich111296 Před rokem +1

    do you have an example in any of your videos connecting to an s3 bucket specifying an endpoint within databricks? basically how to connect to an s3 bucket from a service other than aws? Thanks

    • @BryanCafferky
      @BryanCafferky  Před rokem

      Hmmmm.... No have not tried that. Have you googled it?

    • @rich111296
      @rich111296 Před rokem

      @@BryanCafferky yeah ha, i did find a solution eventually, i think somewhere from stack overflow, searched around several places so i don't have the exact source
      "sc

    • @rich111296
      @rich111296 Před rokem

      and run the function obvi

  • @potnuruavinash
    @potnuruavinash Před měsícem +1

    Can we implement data lakehouse with open source tools like spark, presto & hive metastore ? is there any alternative for unity catalog in open source eco system

    • @BryanCafferky
      @BryanCafferky  Před měsícem

      Lakehouse is just Delta Lake, i.e., delta tables which are available in open source Spark so yes. Unity Catalog is really just a catalog of catalogs so you could build your own central catalog by extracting the meta data from local Hive metastores. I believe Spark tends to work one cluster at a time unlike Databricks which spins any number of clusters up as needed so not sure if UC could be implemented on open source Spark but perhaps?

  • @prarthananeesh
    @prarthananeesh Před 3 měsíci

    Is it mainly used for OLAP or can this be used for OLTP also ?

    • @BryanCafferky
      @BryanCafferky  Před 3 měsíci

      It's meant for data warehousing, i.e., warehouse = lake + house, so warehouse on a data lake. OLTP has stringent requirements like high data transactions concurrency, referential integrity, etc. Delta logging is done at a file level whereas SQL databases log at a row level. See my video on Delta logs to get an understanding of what I mean.

    • @BryanCafferky
      @BryanCafferky  Před 3 měsíci

      Delta Logs 1: czcams.com/video/pCH_qNqnms0/video.html
      Delta Logs 2: czcams.com/video/ZSTJLfZy_Hs/video.html

  • @prarthananeesh
    @prarthananeesh Před 3 měsíci

    Can we use the lakehouse to replace a transactional system ?

    • @BryanCafferky
      @BryanCafferky  Před 3 měsíci

      See my reply to your question about OLTP.