Apache Spark Executor Tuning | Executor Cores & Memory

Afaque Ahmad

zhlédnutí 6 672

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 26. 07. 2024
Welcome back to our comprehensive series on Apache Spark Performance Tuning & Optimisation! In this guide, we dive deep into the art of executor tuning in Apache Spark to ensure your data engineering tasks run efficiently.
🔹 What is inside:
Learn how to properly allocate CPU and memory resources to your Spark executors and the number of executors to create to achieve optimal performance. Whether you're new to Apache Spark or an experienced data engineer looking to refine your Spark jobs, this video provides valuable insights into configuring the number of executors, memory, and cores for peak performance. I’ve covered everything from understanding the basic structure of Spark executors within a cluster, to advanced strategies for sizing executors optimally, including detailed examples and calculations.
📘 Resources:
📄 Complete Code on GitHub: github.com/afaqueahmad7117/sp...
🎥 Full Spark Performance Tuning Playlist: • Apache Spark Performan...
🔗 LinkedIn: / afaque-ahmad-5a5847129
Chapters:
0:00 - Introduction to Executor Tuning in Apache Spark
0:37 - Understanding Executors in a Spark Cluster
3:30 - Example: Sizing Executors in a Cluster
4:58 - Example: Sizing a Fat Executor
9:34 - Example: Sizing a Thin Executor
12:50 - Advantages and Disadvantages of Fat Executor
18:25 - Advantages and Disadvantages of Thin Executor
22:12 - Rules for sizing an Optimal Executor
26:30 - Example 1: Sizing an Optimal Executor
38:15 - Example 2: Sizing an Optimal Executor
43:50 - Key Takeaways
#ApacheSparkTutorial #SparkPerformanceTuning #ApacheSparkPython #LearnApacheSpark #SparkInterviewQuestions #ApacheSparkCourse #PerformanceTuningInPySpark #ApacheSparkPerformanceOptimization #ApacheSpark #DataEngineering #SparkTuning #PythonSpark #ExecutorTuning #SparkOptimization #DataProcessing #pyspark #databricks

Komentáře • 84

@dudechany Před 2 dny ⁺¹
Every-time I come here before attending an interview , I try to give this video a like , but end up realising that I already did it earlier. Best video on this topic on whole internet.
@bijjigirisupraja8021 Před 19 dny ⁺¹
Bro do the videos regularly on spark it will be very helpful. Thank you
@mohitupadhayay1439 Před 17 dny
Really waiting to see if you can add some real world use cases to your videos to strengthen our understanding. It will be appreciated a lot man!
@BabaiChakraborty-ss8pt Před 3 měsíci ⁺¹
Man your tutorials are the best. I have been following you for Spark turning related videos. Thanks
@afaqueahmad7117 Před 2 měsíci
Thank you @BabaiChakraborty-ss8pt, really appreciate it, means a lot to me :)
@mayapareek2844 Před 2 měsíci
Wow !! Great Content !! I am preparing for interviews and found this super helpful. Thanks a Ton !!
@afaqueahmad7117 Před 2 měsíci
Glad you're finding it helpful @mayapareek2844, heartfelt thanks :)
@SandeepPatel-wt7ye Před 20 dny
This is awesome stuff..The executor Tuning concept is explained at a very granular level.
@afaqueahmad7117 Před 19 dny ⁺¹
Appreciate it @SandeepPatel-wt7ye, thank you!
@harshshah8884 Před 10 dny
@@afaqueahmad7117- qq .. let’s say i have limited RAM available like 50 GB and wants to process 1TB data , no additional capacity can be add into cluster , how should we apply based on your video- for optimal # executor / memory per executor.. / core per executor
@iamexplorer6052 Před 3 měsíci
Thanks for this currently working on job optimization it is very useful to me
@afaqueahmad7117 Před 3 měsíci
Thank you, really appreciate it :)
@seenu0104 Před 3 měsíci
Thank you very much for this amazing content with super easy explanation 👏👏
@afaqueahmad7117 Před 3 měsíci
Thank you @seenu0104, really appreciate it :)
@adtempgupta Před 3 měsíci ⁺¹
Thankyou so much for wonderful content. please start PySpark session
@saineelkiranch9790 Před 3 měsíci
Excellent. Very Well Explained
@afaqueahmad7117 Před 3 měsíci
Thank you @saineelkiranch9790, really appreciate it :)
@sankarshkadambari2742 Před 3 měsíci
Amazing is the word you never dissapoint us . very greatful and indebted to you for this excellent content you are creating. God bless you !
@afaqueahmad7117 Před 2 měsíci
Thank you @sankarshkadambari2742, really appreciate it, means a lot to me :)
@leilaturgarayeva105 Před 3 měsíci
Thank you for the useful content! IRL an analyst / engineer would have access to a huge cluster which is shared between many people / teams. It would be very interesting to watch a video where you calculate the amount of resources that should be requested based on the task at hand (particular dataset, task and output). And again - thanks for helping to understand these somewhat hard to grasp concepts :-)
@AshishStudyDE Před měsícem
Great work, going good. I hope you cover 2 more topic of driver oom and executor oom. Why it happens and how we can tackle it.
@ComedyXRoad Před 2 měsíci
thanks for the content and your efforts
@afaqueahmad7117 Před 2 měsíci
Thank you @ComedyXRoad, appreciate the kind words :)
@asokanramasamy2087 Před 3 měsíci
Great! If possible Pls make video of Spark streaming as well!
@chitransh847 Před 22 dny
sir can you please bring python and sql series for prep of interviews and also basics of it , remaining of the content is just great!
@afaqueahmad7117 Před 19 dny
Thank you, appreciate it @chitransh847, Python coming soon :)
@yashwantdhole7645 Před 29 dny
Hi Afaque, it is was a really nice video. Never got such detailed understanding anywhere. Do you also provide 1:1 session? If yes, I am highly interested.
@afaqueahmad7117 Před 28 dny
Hey @yashwantdhole7645, appreciate the kind words, means a lot. At this moment, I do not take 1:1 sessions, but if you have any questions feel free to shoot an email or comment here in this thread :)
@purnimasharma9734 Před 2 měsíci
Hell Afaque, your tutorials are excellent and I learnt so much about optimization techniques. I am wondering if you can add some real world use cases to your videos to strengthen our understanding. It will be appreciated a lot.
@Amarjeet-fb3lk Před 2 měsíci
Thanks for this videos.
I have been watching your videos from quite a while.
You explain things in a very easy and simple manner.
But,
I thinks in real time we would be processing a very large amount of data,
So , It will be great if you can make a video ön processing large amounts of data with all the optimisation techniques we can use.
Thanks in advance.
@afaqueahmad7117 Před 2 měsíci
Hey @Amarjeet-fb3lk, Thank you so much for the kind words; they truly mean a lot! I'm delighted to hear that you find the explanations easy and simple to understand. While production/large-scale projects are in the future plans, I would like to emphasize that the fundamental concepts and optimization techniques remain the same. My goal is to help you build a rock solid understanding of these concepts so you can confidently apply them in any scenario.
@iamkiri_ Před 3 měsíci
Awesome :)
@afaqueahmad7117 Před 2 měsíci
Thank you @iamkiri_, really appreciate it :)
@yatinchadha1803 Před 2 měsíci
Thanks Afaque for this great tutorial. This will really help while working on Spark Optimization. It would be of great help if you can tell how do you deal with this type of questions: -
spark cluster size -- 200 cores and 100 gb RAM
data to be processed --100 gb
give the calculation of spark for driver memory, driver cores, executor memory, overhead memory, number of executors
@afaqueahmad7117 Před měsícem
Hey @yatinchadha1803, thanks for the kind words, really appreciate it. Regarding the question - after watching the video, it should be a cakewalk :)
@yatinchadha1803 Před měsícem ⁺¹
@@afaqueahmad7117 can you please guide on how to calculate the driver memory and driver cores?
@wreckergta5470 Před 3 měsíci
Thanks
@afaqueahmad7117 Před 3 měsíci
Appreciate it, @wreckergta5470 :)
@remedyiq8034 Před 3 měsíci ⁺²
Hi, Can you please make a video on Spark UI or Databricks Spark UI understanding. There are a lot of tabs there; its tough to understand it.
@afaqueahmad7117 Před 3 měsíci ⁺³
Hey @remedyiq8034, could you share which tabs are troubling you? The most important ones, I've discussed, sharing links below:
1. Storage tab: Caching video (czcams.com/video/FujwRYkBwM4/video.html)
2. SQL tab: Master Reading Spark Query Plans video (czcams.com/video/KnUXztKueMU/video.html)
3. Jobs/Stages/SQL - Unlock Performance With Spark DAG Mastery video (czcams.com/video/O_45zAz1OGk/video.html)
@dataterre Před 3 měsíci
Thanks Afaque, this is an excellent video to start my Saturday morning. It has been on my list to do for the whole week. A couple of questions for you / community since this is very relevant to my current work.
1) Considering we are "exhausting" the cluster resources, could you explain where does driver node come into the picture in this pool of resources (e.g. --driver-memory)? I presume a sizeable amount of driver memory is required since we tend to collect data in the driver node in a count(), etc.
2) Understand the concept of optimal executor sizing here. Suppose my application abstraction is looking at optimal Spark sessions running in parallel, then this optimal tuning here would mean I can only run 1 spark-submit job in the entire cluster, right?
Excellent video, again
@afaqueahmad7117 Před 3 měsíci
Hi @dataterre, thank you for the kind words, means a lot to me :) On the questions:
1. Indeed, a reasonable amount of cores and memory is required for the driver because it is the one coordinating the lifecycle of the application, managing communication, creating and scheduling tasks to be executed on executors. However, in this video, with specific focus being on "executor" tuning, driver resource allocation is skipped, but it's important to note (as you rightly pointed out) - driver will need resources for it's own functioning / executing it's responsibilities + collecting data as a result of actions (count(), show() etc..). I would think of subtracting out an appropriate number for driver cores and memory from the total cluster cores/memory and then doing the executor sizing discussed in the video.
2. Yes, this example assumes, you're taking up the whole cluster for best utilization. However, if you're looking forward to running multiple Spark sessions in parallel, you could do the following:
a. Enable dynamic allocation (by setting `spark.dynamicAllocation.enabled` set to `true`) to allow each session to use resources.
b. Define a reasonable minimum and maximum number of executors per application (by using `spark.dynamicAllocation.minExecutors`, `spark.dynamicAllocation.maxExecutors`)
c. Adjust `spark.executor.cores` and `spark.executor.memory` using the principles/rules as discussed (in video), to ensure that each application gets enough resources to perform efficiently but not so much that it monopolizes cluster resources
@ashutoshpatkar4891 Před 2 dny
Hey man. learnt a lot from the video. please help me out on this doubt
for example 2, total executors = 44/4 = 11 you have said. But shouldn't we think machine by machine, here each machine can have, 15/4 === 3 executors if 4 core for each, giving total 3*3 nodes = 9. in your workout, it seems like there will be an executor which will use some cores from one node and some from other. Am I wrong in my thought process somewhere?
@ajaydhanwani4571 Před 2 dny
sorry if I am asking very basic question, can we set executors per spark job or per spark cluster? Also how to set this up using coding examples and all
@satheeshkumar2149 Před 3 měsíci
How much of memory or core should we set aside for the internal stuff if we have got a standalone cluster instead of YARN ?
@atifiu Před 2 měsíci
Thanks Afaque for this video. Have question regarding task level and executor level parallelism. As per my understanding 1 partition = 1 task = 1 core/thread, so how task level parallelism is executed as 1 task will be assigned to only one core which means within a executor remaining 46 cores will not be utilized if number of tasks are say only 5.
@ShubhamWakshe-e4c Před 26 dny
if we have already alloting 1 core and 1 gb ram for yarn/os deamons then why do we need to allot seperate 1 core and 1 gb or one executor for yarn resource manager?
@ShubhamWakshe-e4c Před 26 dny
you talked about yarn application master. is it driver which contain application master container right? means we are assigning driver memory as 1 gb. right?
@naveenreddybedadala Před měsícem
Will that final actual executor memory again split into user,reserve, unified, overhead memory??
@rohitdeshmukh7274 Před měsícem
Very informative video. I have one question. I’m having databricks cluster and auto scaling is enabled. Will calculations change in that case?
@adusumillisudheer2772 Před 25 dny
same question to me also. when autoscaling is enabled. how it will tune up the workers and executors inside it.
@Amarjeet-fb3lk Před 2 měsíci
Hi @Afaque
I watched this video previously ,and I am still watching many more videos that covers, spark memory management and reading articles on spark memory and partitions.
So here are some points that I have learnt.
1. Memory for each core should we 4 times of 128MB.
2. Total number of partitions should be , 4*no. Of cores.
But,
How should we decide the no. Of partitions,each partitions size, memory for each core.
Because, this things will change,according to our data.
So,can u answer this 3 questions?
Thanks.
@roshankumargupta46 Před 2 měsíci
Hi Afaque! Can you confirm if I'm wrong here. So if thin executors promote more parallelism than fat executors? Because in the case of a thin executor, the number of executors will be higher, resulting in more individual cores, which will eventually promote parallelism. Whereas in Fat executor, all cores will be consumed by Executors which may lead to wastage of resources.
@remedyiq8034 Před 3 měsíci ⁺¹
At 35:10 @afaqueahmad7117 I want to add one point. You said that executions happen in execution memory, that is 60 % percent, and 40 percent is user memory. So . 60 Percent of 20GB -> is 12 GB memory. Out of which 50 percent is for execution and 50 percent for storage. Let's assume 50 percent is given to execution(static allocation). Out of 12 GB, only 6 GB is for execution. As we have 5 cores per executor. Therefore 6/5 === approximately 1.2 per portion of memory per core. The maximum partition size that can be accommodated is 1.2 GB of partition. My thought process is correct ????
@iamkiri_ Před 3 měsíci
Looks Like this is a valid question bro!
@afaqueahmad7117 Před 3 měsíci ⁺¹
Hi @remedyiq8034, this is a very valid point and thanks for highlighting this. You're absolutely right about ~1.2GB memory per core. My mind was referring to execution memory but I really appreciate your attention to the breakdown of the `--executor-memory` into its various components, which I should have explained more clearly before doing the memory per core calculation. I'll look into adding an info card to make this clear in the video. Thanks again for your sharp observation!
@remedyiq8034 Před 3 měsíci
@@afaqueahmad7117 Thanks > I learned a lot from you. Watched all your videos. Keep doing great work for the community . Better than paid coursed of Udemy !!
@maheshmahadev9918 Před 3 měsíci
Great Explanation, thanks !! I have a question: Can you explain the basis for choosing these numbers? Is it based on the incoming data that needs to be processed? In that case, for the calculations in this video, what is the data size considered. Thanks again
@afaqueahmad7117 Před 3 měsíci
Hey @maheshmahadev9918, the numbers for the cluster (X Nodes, Y Cores, Z RAM) are for illustration and independent of the incoming data size. As discussed in 34:06, the reason why I'm not talking about incoming data sizes because that should be tailored based on the "Memory per core". The most granular unit of data is going to be a "partition", and as long the core has got enough memory to process that partition, things will run fine. Would suggest to re-watch 34:06 again, if unclear :)
@maheshh1695 Před 3 měsíci
Hi thanks for sharing the information
In fat executor case, since we have 5 nodes and each node is having only one executor , then number of cores should be 5*11 ie 55 cores right
@afaqueahmad7117 Před měsícem
Hey @maheshh1695, total cores will be 55 while cores per node is 11
@Wonderscope1 Před měsícem
I really enjoy your videos. Thanks for sharing your knowledge.
I have a question about how you create these videos. It is an amazing way to create tutorial videos. Do you mind share what tools you use to make these videos?
Thanks
@afaqueahmad7117 Před měsícem ⁺¹
Thank you @Wonderscope1, really appreciate it. I use Notion and Miro :)
@Wonderscope1 Před měsícem
@@afaqueahmad7117 I am familiar with Notion as project managmeent tool I didn't know it can help with video production. I need to look into that. Thanks 😊
@afaqueahmad7117 Před měsícem ⁺¹
Sorry I meant Notion for the code snippets. I use Ecamm Live for video production :)
@Wonderscope1 Před měsícem
@@afaqueahmad7117 perfect that's what I was looking for . Thanks :)
@swapnilpatil18 Před 2 měsíci
Hi , in case of fat executor we assigned all 47 GB remaining to executor (1 GB for Hadoop yarn ops). In this case from where executor overhead memory will come from ??
@afaqueahmad7117 Před 2 měsíci
Hey @swapnilpatil18, Good question. In the initial parts of the video (before explaining the 4 rules to size an optimal executor), the goal to explain fat executors was to only point out that they take up a large portion of the memory on a node and that was the rationale for not separating out the respective parts i.e. overhead memory, AM memory.
However, you understanding is absolutely correct. The ideal calculation should involve subtraction of Max(384MB, 10 % 47GB) = Max(384MB, 4.7GB) = 4.7GB per executor before calculating the `--executor-memory`
@suresh.suthar.24 Před 2 měsíci
wonderfull explanation ahmad, i have one doubt like as in your example 23GB memory willl be assigned to each and every executor and then 10% will excluded for overhead memory so we will left with 20 GB memory for executor. So now this 20 GB memory is ON heap memory and this will be divided into reserved memory, storage memory, execution memory.
Am i wrong or right please reply i have asked this question to my seniors but they dont have answer for this.
Thank you in advance..!!
@afaqueahmad7117 Před 2 měsíci
Hey @SS1251, You're correct! The 20GB of memory is indeed on-heap memory and it will be divided respectively into reserved, storage, and execution memory. The memory defined through `--executor-memory` or `spark.executor.memory` is the one allocated to on-heap. You can refer this video to get a better understanding: czcams.com/video/sXL1qgrPysg/video.html :)
@rambabuposa5082 Před 3 měsíci
Hi @afaqueahmad7117
At 35.30 minutes, you were discussing about "Memory per core" which 4gb per core. If we have partitions with size of 128Mb or 256Mb with this 4gb per core configuration, is that mean any inefficient utilisation of resources (memory)? because one core can process upto 4gb but partition size is very less.
Do we need to reduce "Memory per core" size to get better performance and efficient utilisation of resources?
Many thanks
@afaqueahmad7117 Před 3 měsíci
Hey @rambabuposa5082, Good question! 4GB per core was for an example. If the partition sizes are 128MB or 256MB, then this would indeed be underutilising the cluster. You could reduce the memory per core giving some room for overhead (maybe 400MB per core for a 256MB partition), however, it's important to keep the 4 rules of the game as discussed in mind (e.g. keeping number of cores
@remedyiq8034 Před 3 měsíci ⁺¹
@@afaqueahmad7117 I want to add one point. You told that executions happen in execution memory, that is 60 % percent and 40 percent is user memory. So . 60 Percent of 20GB --> is 12 GB memory. Out of which 50 percent is for execution and 5- percent storage. Let's assume 50 percent is given to execution(static allocation). out of 12 GB, only 6 GB is for execution. As we have 5 cores per executor. therefore 6/5 === approximately 1.2 per portion of memory per core. Maximum partition size can be accommodated is 1.2 GB of partition. MY thought process is correct ????
@afaqueahmad7117 Před 2 měsíci ⁺¹
Copying the same answer as in the previous comment for the community :)
"""
Hi @remedyiq8034, this is a very valid point and thanks for highlighting this. You're absolutely right about ~1.2GB memory per core. My mind was referring to execution memory but I really appreciate your attention to the breakdown of the `--executor-memory` into its various components, which I should have explained more clearly before doing the memory per core calculation. I'll look into adding an info card to make this clear in the video. Thanks again for your sharp observation!
"""
@Amarjeet-fb3lk Před 2 měsíci
Hi, I watched this video till end.
Very good explanation.
But, I have below doubts.
If no. of cores are 5 per executor,
At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core.
Suppose, that
My config is, 2 executor each with 5 core.
Now, how it will create 200 partitions if I do a group by operation?
There are 10 cores, and 200 partitions are required to store them, right?
How is that possible?
@afaqueahmad7117 Před 2 měsíci
Hi @Amarjeet-fb3lk, thanks again for the kind words. Regarding your question, you're right in stating that 1 partition will be processed by 1 core. Given the configuration you shared has 2 * 5 = 10 cores in total, it is not necessary for the number of cores to match the number of partitions exactly at any given moment. Spark will create 200 partitions during shuffle by default and it will manage the execution of those 200 partitions by scheduling the tasks in chunks based on resource availability - firstly 10 partitions assigning them to 10 cores and once those 10 cores are freed, then the remaining 10 and so on.. until all 200 partitions are processed.
@Amarjeet-fb3lk Před 2 měsíci
@@afaqueahmad7117 thanks for your response Afaque. Learning and going deep into the topics, bringing me lots of doubts and questions.
Thanks for the answer,highly appreciate that.
@vikastangudu712 Před 3 měsíci
Great Video, Thanks for the Explanation,
But how would a fat exec improve Data Locality ?
A node can be broken into 11 exec or 1 exec, The HDFS storage or some other storage within the node is still the same for all the exec inside the node.
Data Locality talks about the storage not memory. Thus Fat/Thin --> No effect on Data Locality.
@rambabuposa5082 Před 3 měsíci ⁺¹
Because FAT executor have more memory, it can store more partitions of your dataset and not much shuffling of data is required, and also it increases data locality (i.e. most of its required partitions are stored within that FAT executor)
@afaqueahmad7117 Před 3 měsíci
Hey @vikastangudu712, you're correct in saying that data locality talks about "storage". However, what I'm referring to is that the interplay with "memory" becomes important once data is loaded in memory in the sense that "how much" amount of data can be processed without having go through the overhead of having to load data from disk again. Several operations are going to benefit from this "memory" locality.
In Spark, the best form of locality is `PROCESS_LOCAL` - which would mean that the data required for a task is present in the memory of the same JVM. Therefore, fat executors occupying most memory of the node would benefit in this case - given that the chances of data being present on the same JVM increases.
Hope this clarifies :)
@tushibhaque863 Před měsícem
Thanks and please provide contact details .Also do you take classes?
@afaqueahmad7117 Před 28 dny
Hey @tushibhaque863, appreciate the kind words. At this moment, I do not take classes, but if you have any questions feel free to shoot an email or comment here in this thread :)
@ranvijaymehta Před 3 měsíci
Thankyou
@afaqueahmad7117 Před 3 měsíci
Appreciate it, @ranvijaymehta :)

Další v pořadí

Automatické přehrávání

Shuffle Partition Spark Optimization: 10x Faster!