Why Data Skew Will Ruin Your Spark Performance
Vložit
- čas přidán 26. 07. 2024
- Spark Performance Tuning
Welcome back to my channel. In this tutorial to dive into this comprehensive Apache Spark tutorial, where we will cover Apache Spark optimization techniques. Are you struggling with Data Skew and uneven partitioning while running Spark jobs? You're not alone! In this video, we dive deep into the world of Spark Performance Tuning and Data Engineering to tackle the common issue of Data Skew. We'll discuss the causes, the signs, and most importantly, the solutions to manage uneven data distribution and optimize your Spark applications' performance with apache spark practical examples.
🔍 Key takeaways from the video:
Understanding Data Skew: Unveiling the meaning and the impact of data skew on your Spark applications.
Identifying Data Skew: Using the Spark UI to pinpoint data skew and its implications on your application's runtime.
Spark Performance Tuning: Techniques to deal with skewed data, optimize resource utilization, and enhance the performance of your Spark jobs.
Data Engineering Best Practices: Sharing key insights into managing data effectively for optimal performance.
💡 This video is perfect for data engineers, big data enthusiasts, and anyone looking to optimize their Spark applications and tackle data skew head-on.
📄Complete Code on GitHub: github.com/afaqueahmad7117/sp...
🎥 Full Spark Performance Tuning Playlist: • Apache Spark Performan...
🔗 LinkedIn: / afaque-ahmad-5a5847129
Chapters:
00:00 Introduction
00:40 How to identify a Data Skew?
02:28 When does Data Skew happen?
04:27 Operations that cause Data Skew
06:18 Why is Data Skew bad? Why does it matter?
07:36 Code example to simulate a skewed dataset
📌 Don't forget to like, share, and subscribe to stay updated with the latest tech and coding content. Hit the notification bell to never miss an update!
#dataanalytics #DataEngineering #ApacheSpark #PerformanceTuning #DataSkew #BigData #TechTips #Coding #SparkPerformanceTuning
It really great video. Most of the people will explain the things at high level but I can see your videos are in-depth of the things.
I appreciate you liked it :)
Thanks!!
Hi Afaque, it will be really helpful, if you demonstrate all the topics of spark optimization (Shuffling,Salting, tunning configuration etc)
in a single video where you can implement everything based on diff. scenarios. Thank you for your videos.
Hi @sayedsamimahamed5324, I have a playlist explaining these topics - shuffling, salting, tuning in details with code examples. Reason why they're separated into distinct videos so that it's easy to absorb, because each has a complexity of it's own :)
Playlist: czcams.com/play/PLWAuYt0wgRcLCtWzUxNg4BjnYlCZNEVth.html
Great video Ahmad.. This video is so crisp and clear. Btw, do you upload your notebooks anywhere?.please do share it really helps bro
Thanks @mukeshc8172 for the appreciation. I've updated the description with the GitHub link for the notebook :)
very informative, but I suggest, the video length should be shorter
amazing one more video from you . How do We fix this issue ?
Coming soon this week on AQE, Broadcast Joins & Salting! :)
Fix Data Skew Using AQE & Broadcast Joins: czcams.com/video/bRjVa7MgsBM/video.html
Fix Data Skew Using Salting: czcams.com/video/rZGsc5y8AQk/video.html
Really great videos. Is it possible to connect with you ?
Thanks @atifiu, you could send me a connection request on LinkedIn :)
True brother great depth in explanation