16 Understand Spark Execution on Cluster

Sdílet
Vložit
  • čas přidán 7. 09. 2024

Komentáře • 19

  • @easewithdata
    @easewithdata  Před 10 měsíci +3

    Note: For Standalone clusters: the --num-executors parameter may not work always.
    So, to control the number of executors:
    1. define number of cores per executors with --executor-cores parameter (spark.executor.cores)
    2. control max number of cores for execution with --total-executor-cores parameter (spark.cores.max)
    If you need 3 executors with 2 cores (you don't need to use --num-executors)
    --executor-cores 2 --total-executor-cores 6
    --num-executors parameter can be used to control number of executor for yarn resource manager. No need to worry as we will work more with spark cluster configuration if future sessions.

    • @kunalnandwana4280
      @kunalnandwana4280 Před 6 měsíci

      @easewithdata. How you are running cluster mode on local machine? Means from where you are getting this much of resources

    • @satishkumarparida4797
      @satishkumarparida4797 Před 4 měsíci

      Same question as Kunal, how are you running Cluster Mode in Local Machine, little bit of context will be good here.

    • @easewithdata
      @easewithdata  Před 4 měsíci

      Hello Kunal & Satish,
      I have a 4 core, 8 processor machine. Docker utilizes hyperthreading to enable multi-processing with the same core. This is the reason you see 16 cores (2 threads each processor) available in cluster. And docker doesn't allocate complete resource from host machine to containers rather some percentage of it, which can be controlled using parameters.
      You can learn more about it in Docker documentations.

  • @gyanaranjannayak3333
    @gyanaranjannayak3333 Před 4 měsíci +1

    Can you please tell how both master node and two workers node run on same machine?

    • @easewithdata
      @easewithdata  Před 4 měsíci

      Hello,
      I am using docker to run both master and worker nodes as docker containers.

  • @Kevin-nt4eb
    @Kevin-nt4eb Před měsícem

    so in deployement mode the driver program is submitted inside a executer which is present inside a cluster. am I rignt?

    • @easewithdata
      @easewithdata  Před měsícem

      The spark submit command on the driver not on executors

  • @shivakant4698
    @shivakant4698 Před 2 měsíci

    spark's standalone cluster is where on docker or any where please tell me my cluster execution codes are not running why?

    • @easewithdata
      @easewithdata  Před 2 měsíci

      Standalone cluster used in this tutorial is on docker. You can set it up yourself.
      For notebook - hub.docker.com/r/jupyter/pyspark-notebook
      You can use the below docker file to setup cluster
      github.com/subhamkharwal/docker-images/tree/master/spark-cluster-new

    • @adulterrier
      @adulterrier Před 27 dny

      @@easewithdata this link is not valid. I assume, you mean "pyspark-cluster-with-jupyter"?

  • @gyanaranjannayak3333
    @gyanaranjannayak3333 Před 4 měsíci +1

    How Are you running this Spark stand alone cluster? You have installed Spark on you system separately and running or what? I am using with pip install pyspark right now. What I have to do to use this standalone cluster like you are doing?

    • @easewithdata
      @easewithdata  Před 4 měsíci

      Hello,
      I am using docker containers to run a standalone Cluster.

    • @gyanaranjannayak3333
      @gyanaranjannayak3333 Před 4 měsíci

      @@easewithdata both master slave executor running on same machine?

  • @bhavishyasharma998
    @bhavishyasharma998 Před 3 měsíci

    Hi, can you please tell how a data frame with 10 column gets partitioned into 11 parts with 2 executors having 8 cores i.e. total 16 cores processing it?

    • @easewithdata
      @easewithdata  Před 3 měsíci

      Dataframes/data is not partitioned based on number of columns. Its is partitioned based on data (horizontal partitioning).

    • @bhavishyasharma998
      @bhavishyasharma998 Před 3 měsíci

      @@easewithdata ok thanks