Note: For Standalone clusters: the --num-executors parameter may not work always. So, to control the number of executors: 1. define number of cores per executors with --executor-cores parameter (spark.executor.cores) 2. control max number of cores for execution with --total-executor-cores parameter (spark.cores.max) If you need 3 executors with 2 cores (you don't need to use --num-executors) --executor-cores 2 --total-executor-cores 6 --num-executors parameter can be used to control number of executor for yarn resource manager. No need to worry as we will work more with spark cluster configuration if future sessions.
Hello Kunal & Satish, I have a 4 core, 8 processor machine. Docker utilizes hyperthreading to enable multi-processing with the same core. This is the reason you see 16 cores (2 threads each processor) available in cluster. And docker doesn't allocate complete resource from host machine to containers rather some percentage of it, which can be controlled using parameters. You can learn more about it in Docker documentations.
Standalone cluster used in this tutorial is on docker. You can set it up yourself. For notebook - hub.docker.com/r/jupyter/pyspark-notebook You can use the below docker file to setup cluster github.com/subhamkharwal/docker-images/tree/master/spark-cluster-new
How Are you running this Spark stand alone cluster? You have installed Spark on you system separately and running or what? I am using with pip install pyspark right now. What I have to do to use this standalone cluster like you are doing?
Hi, can you please tell how a data frame with 10 column gets partitioned into 11 parts with 2 executors having 8 cores i.e. total 16 cores processing it?
Note: For Standalone clusters: the --num-executors parameter may not work always.
So, to control the number of executors:
1. define number of cores per executors with --executor-cores parameter (spark.executor.cores)
2. control max number of cores for execution with --total-executor-cores parameter (spark.cores.max)
If you need 3 executors with 2 cores (you don't need to use --num-executors)
--executor-cores 2 --total-executor-cores 6
--num-executors parameter can be used to control number of executor for yarn resource manager. No need to worry as we will work more with spark cluster configuration if future sessions.
@easewithdata. How you are running cluster mode on local machine? Means from where you are getting this much of resources
Same question as Kunal, how are you running Cluster Mode in Local Machine, little bit of context will be good here.
Hello Kunal & Satish,
I have a 4 core, 8 processor machine. Docker utilizes hyperthreading to enable multi-processing with the same core. This is the reason you see 16 cores (2 threads each processor) available in cluster. And docker doesn't allocate complete resource from host machine to containers rather some percentage of it, which can be controlled using parameters.
You can learn more about it in Docker documentations.
Can you please tell how both master node and two workers node run on same machine?
Hello,
I am using docker to run both master and worker nodes as docker containers.
so in deployement mode the driver program is submitted inside a executer which is present inside a cluster. am I rignt?
The spark submit command on the driver not on executors
spark's standalone cluster is where on docker or any where please tell me my cluster execution codes are not running why?
Standalone cluster used in this tutorial is on docker. You can set it up yourself.
For notebook - hub.docker.com/r/jupyter/pyspark-notebook
You can use the below docker file to setup cluster
github.com/subhamkharwal/docker-images/tree/master/spark-cluster-new
@@easewithdata this link is not valid. I assume, you mean "pyspark-cluster-with-jupyter"?
How Are you running this Spark stand alone cluster? You have installed Spark on you system separately and running or what? I am using with pip install pyspark right now. What I have to do to use this standalone cluster like you are doing?
Hello,
I am using docker containers to run a standalone Cluster.
@@easewithdata both master slave executor running on same machine?
Hi, can you please tell how a data frame with 10 column gets partitioned into 11 parts with 2 executors having 8 cores i.e. total 16 cores processing it?
Dataframes/data is not partitioned based on number of columns. Its is partitioned based on data (horizontal partitioning).
@@easewithdata ok thanks