Speed Up Your Spark Jobs Using Caching

Sdílet
Vložit
  • čas přidán 26. 07. 2024
  • Welcome to our easy-to-follow guide on Spark Performance Tuning, honing in on the essentials of Caching in Apache Spark. Ever been curious about Lazy Evaluation in Spark? I’'ve got it broken down for you. Dive into the world of Spark's Lineage Graph and understand its role in performance.
    The age-old debate, Spark Persist vs. Cache, is also tackled in this video to clear up any confusion. Learn about the different Storage Level in Spark used with Persist and how it can make a difference in your tasks.
    📄 Complete Code on GitHub: github.com/afaqueahmad7117/sp...
    🎥 Full Spark Performance Tuning Playlist: • Apache Spark Performan...
    🔗 LinkedIn: / afaque-ahmad-5a5847129
    Table credits (Storage Levels, When to use what?): sparkbyexamples.com/spark/spa...
    Chapters:
    00:00 Introduction
    00:39 Why Should You Use Caching?
    06:45 Lazy Evaluation & How Could Caching Help You?
    10:12 Code + Spark UI Explanation Caching vs No Caching
    14:21 Persist & Storage Levels In Persist
    #spark #dataengineering #apachespark #lazyevaluation #lineagegraph #storagelevel #persist #cache #persistvscache #sparkperformancetuning #sparkoptimization #uncache #unpersist

Komentáře • 16

  • @HimanshuGupta-xq2td
    @HimanshuGupta-xq2td Před 27 dny

    Content is useful.
    Please make more video 😊

  • @deepakrawat418
    @deepakrawat418 Před 10 měsíci +1

    great explanation, plz create one end-to-end project also

  • @hritiksharma7154
    @hritiksharma7154 Před 9 měsíci

    Great explanation. Waiting for new videos.

  • @OmairaParveen-uy7qt
    @OmairaParveen-uy7qt Před 10 měsíci

    Explained very well!
    Great content!

  • @kunalberry5776
    @kunalberry5776 Před 10 měsíci

    Very informative video.Thanks for sharing

  • @AtifImamAatuif
    @AtifImamAatuif Před 10 měsíci

    Excellent content. Very Helpful.

  • @mission_possible
    @mission_possible Před 10 měsíci

    Thanks for the videos... keep going

  • @RohanKumar-mh3pt
    @RohanKumar-mh3pt Před 10 měsíci

    kindly cover apache spark scenerio based questions also

  • @ManojKumarV11
    @ManojKumarV11 Před 7 měsíci

    Can we persist any dataframe irrespective of the size of the data it has? Or are there any limitations in caching dataframes?

  • @gananjikumar5715
    @gananjikumar5715 Před 9 měsíci

    Thanks for sharing, small query
    Do we need to cache based on number of transformations being done on that dataframe or if we are doing more actions on that dataframe/using that dataframe

    • @afaqueahmad7117
      @afaqueahmad7117  Před 9 měsíci

      Thanks @gananjikumar5715, transformations are accumulated until an action is called. So, it would be based on the number of actions; If you're performing several actions, better to cache the Dataframe first, otherwise Spark will re-create the DAG when executing a new action.

  • @anirbansom6682
    @anirbansom6682 Před 8 měsíci

    If we do not explicitly unpersist, what would happen to the data? Would it be cleaned by the next GC cycle ? Also what is the best practice , explicitly unpersist or leave it to GC.

    • @afaqueahmad7117
      @afaqueahmad7117  Před 7 měsíci +2

      Hey @anirbansom6682, Data would be kept in memory until the Spark application ends, or the context is stopped or is evicted because Spark needs to free up memory to make room for other data. It may also be evicted during next GC cycle. But this process is a little uncertain as it depends completely on Spark's own memory management policies and JVM's garbage collection process.
      Leaving it to GC would be a passive approach over which you've lesser control and is much more like a black box unless you're well aware of its policies.
      The best practice, however, is to explicitly unpersist when they're no longer needed. This will give you more control over your application's memory usage and can help prevent memory issues in long running Spark applications where different datasets are cached over time.

  • @reyazahmed4855
    @reyazahmed4855 Před 9 měsíci

    Nice video. By the what device you use to write on the screen for teaching bro