Why Data Skew Will Ruin Your Spark Performance

[100% Interview Question] Cache and Persist in Spark

Master Reading Spark DAGs

Crossing the Most Dangerous Crosswalk

She blended SPAGHETTI @anasofiafehn

This bag is perfect for YouTube button couriers! 🏃📦✨

Speed Up Your Spark Jobs Using Caching

Afaque Ahmad

zhlédnutí 3 398

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 26. 07. 2024
Welcome to our easy-to-follow guide on Spark Performance Tuning, honing in on the essentials of Caching in Apache Spark. Ever been curious about Lazy Evaluation in Spark? I’'ve got it broken down for you. Dive into the world of Spark's Lineage Graph and understand its role in performance.
The age-old debate, Spark Persist vs. Cache, is also tackled in this video to clear up any confusion. Learn about the different Storage Level in Spark used with Persist and how it can make a difference in your tasks.
📄 Complete Code on GitHub: github.com/afaqueahmad7117/sp...
🎥 Full Spark Performance Tuning Playlist: • Apache Spark Performan...
🔗 LinkedIn: / afaque-ahmad-5a5847129
Table credits (Storage Levels, When to use what?): sparkbyexamples.com/spark/spa...
Chapters:
00:00 Introduction
00:39 Why Should You Use Caching?
06:45 Lazy Evaluation & How Could Caching Help You?
10:12 Code + Spark UI Explanation Caching vs No Caching
14:21 Persist & Storage Levels In Persist
#spark #dataengineering #apachespark #lazyevaluation #lineagegraph #storagelevel #persist #cache #persistvscache #sparkperformancetuning #sparkoptimization #uncache #unpersist

Komentáře • 16

@HimanshuGupta-xq2td Před 27 dny
Content is useful.
Please make more video 😊
@afaqueahmad7117 Před 26 dny
Appreciate it @HimanshuGupta-xq2td, thank you :)
@deepakrawat418 Před 10 měsíci ⁺¹
great explanation, plz create one end-to-end project also
@hritiksharma7154 Před 9 měsíci
Great explanation. Waiting for new videos.
@OmairaParveen-uy7qt Před 10 měsíci
Explained very well!
Great content!
@kunalberry5776 Před 10 měsíci
Very informative video.Thanks for sharing
@AtifImamAatuif Před 10 měsíci
Excellent content. Very Helpful.
@mission_possible Před 10 měsíci
Thanks for the videos... keep going
@RohanKumar-mh3pt Před 10 měsíci
kindly cover apache spark scenerio based questions also
@ManojKumarV11 Před 7 měsíci
Can we persist any dataframe irrespective of the size of the data it has? Or are there any limitations in caching dataframes?
@gananjikumar5715 Před 9 měsíci
Thanks for sharing, small query
Do we need to cache based on number of transformations being done on that dataframe or if we are doing more actions on that dataframe/using that dataframe
@afaqueahmad7117 Před 9 měsíci
Thanks @gananjikumar5715, transformations are accumulated until an action is called. So, it would be based on the number of actions; If you're performing several actions, better to cache the Dataframe first, otherwise Spark will re-create the DAG when executing a new action.
@anirbansom6682 Před 8 měsíci
If we do not explicitly unpersist, what would happen to the data? Would it be cleaned by the next GC cycle ? Also what is the best practice , explicitly unpersist or leave it to GC.
@afaqueahmad7117 Před 7 měsíci ⁺²
Hey @anirbansom6682, Data would be kept in memory until the Spark application ends, or the context is stopped or is evicted because Spark needs to free up memory to make room for other data. It may also be evicted during next GC cycle. But this process is a little uncertain as it depends completely on Spark's own memory management policies and JVM's garbage collection process.
Leaving it to GC would be a passive approach over which you've lesser control and is much more like a black box unless you're well aware of its policies.
The best practice, however, is to explicitly unpersist when they're no longer needed. This will give you more control over your application's memory usage and can help prevent memory issues in long running Spark applications where different datasets are cached over time.
@reyazahmed4855 Před 9 měsíci
Nice video. By the what device you use to write on the screen for teaching bro
@afaqueahmad7117 Před 9 měsíci
Thanks @reyazahmed4855, I use iPad

Další v pořadí

Automatické přehrávání

Why Data Skew Will Ruin Your Spark Performance

Why Data Skew Will Ruin Your Spark Performance

[100% Interview Question] Cache and Persist in Spark

[100% Interview Question] Cache and Persist in Spark

Master Reading Spark DAGs

Master Reading Spark DAGs

Crossing the Most Dangerous Crosswalk

Crossing the Most Dangerous Crosswalk

She blended SPAGHETTI @anasofiafehn

She blended SPAGHETTI @anasofiafehn

This bag is perfect for YouTube button couriers! 🏃📦✨

This bag is perfect for YouTube button couriers! 🏃📦✨

БАБУШКИН КОМПОТ В СОЛО

БАБУШКИН КОМПОТ В СОЛО

Data Caching in Apache Spark | Optimizing performance using Caching | When and when not to cache

Data Caching in Apache Spark | Optimizing performance using Caching | When and when not to cache

Partitioning vs Bucketing | Interview Question | PySpark #pyspark #bigdata #pwc #interview

Partitioning vs Bucketing | Interview Question | PySpark #pyspark #bigdata #pwc #interview

The TRUTH About High Performance Data Partitioning

The TRUTH About High Performance Data Partitioning

Apache Spark Memory Management

Apache Spark Memory Management

Solve using REGEXP_REPLACE and REGEXP_EXTRACT in PySpark

Solve using REGEXP_REPLACE and REGEXP_EXTRACT in PySpark

Speed Up Data Processing with Apache Parquet in Python

Speed Up Data Processing with Apache Parquet in Python

Learn Apache Spark in 10 Minutes | Step by Step Guide

Learn Apache Spark in 10 Minutes | Step by Step Guide

Apache Spark Executor Tuning | Executor Cores & Memory

Apache Spark Executor Tuning | Executor Cores & Memory

I gave 127 interviews. Top 5 Algorithms they asked me.

I gave 127 interviews. Top 5 Algorithms they asked me.

When You're a Chef and a Katana Owner...

When You're a Chef and a Katana Owner...

Little girl's dream of a giant teddy bear is about to come true #shorts

Little girl's dream of a giant teddy bear is about to come true #shorts

Mellstroy Vs Mrbeast - Who Will Win The Picture Completion Game? 🤔

Mellstroy Vs Mrbeast - Who Will Win The Picture Completion Game? 🤔

Bike maintenance lifehack. Correct way to lubricate your bicycle🚴 #shorts #diy #bike #tips

Bike maintenance lifehack. Correct way to lubricate your bicycle🚴 #shorts #diy #bike #tips

Llegó al techo 😱

Llegó al techo 😱

Growing An Ear In Your Arm 😨

Growing An Ear In Your Arm 😨

Playing hide and seek with my dog 🐶

Playing hide and seek with my dog 🐶

EURO 2024 Byl NEJNUDNĚJŠÍ Turnaj ve FOTBALE…

EURO 2024 Byl NEJNUDNĚJŠÍ Turnaj ve FOTBALE…