Jupyter Notebook with Apache Spark 2.4 on Kubernetes example with GKE - client mode

Sdílet
Vložit
  • čas přidán 26. 08. 2024
  • Jupyter Notebook with Apache Spark 2.4 on Kubernetes example with GKE - client mode

Komentáře • 11

  • @SonuKumar-fn1gn
    @SonuKumar-fn1gn Před 7 dny

    Hi Please could you help, I'm using jupyter notebook and spark driver and executor inside kubernetes but once i try to run .show() command spark driver not execute the job and it stuck and

  • @mayurthakur29
    @mayurthakur29 Před 5 lety

    @Holden , we are tryning to use Spark 2.4 using client mode through jupyter notebook , we are able to use it using spark-submit (cluster mode from server ) . Jupyter notebook is deployed in k8s cluster it self , we are able to use it by creating master / worker static pod . Currently we are facing issue as we run the task , the pod get created and and get destroyed and the task is in hung mode .
    Below command is used in jupter notbook
    import os
    os.getcwd()
    import pyspark
    import socket
    %%time
    conf = pyspark.conf.SparkConf()
    -
    conf.setMaster("k8s://x.x.x.x:6443") \
    .set("spark.submit.deployMode",'client') \
    .set("spark.kubernetes.namespace","spark-project1") \
    .set("spark.driver.hostname",'socket.gethostbyname(socket.gethostname())') \
    .set("spark.driver.port","7787") \
    .set("spark.kubernetes.container.image","x.x.x.x:5000/spark/spark-py:v1") \
    .set("spark.executor.instances","3") \
    .set("spark.kubernetes.pyspark.pythonVersion","3") \
    .set("spark.kubernetes.authenticate.driver.serviceAccountName","spark")
    -
    %%time
    sc = pyspark.context.SparkContext.getOrCreate(conf=conf)
    -
    sc
    -
    %%time
    rdd = sc.parallelize(range(10))
    -
    print(rdd.sumApprox(10)) ----- > getting stuck over here
    As we observed the difference between spark-submit and client mode , is that driver pod are not initiated , is this the behavior ?

    • @HoldenKarau
      @HoldenKarau  Před 5 lety

      So doing the notebook it in non-client mode I'm not super sure I haven't done that deployment. I'd try asking on the user@ list and if that doesn't go anywhere send me an e-mail personally and I'll explore it some with Ilan.

    • @mayurthakur29
      @mayurthakur29 Před 5 lety

      @@HoldenKarau thanks, I did some troubleshooting , it was issue with reverse routing between juypter and k8s cluster .
      It's resolved now

    • @HoldenKarau
      @HoldenKarau  Před 5 lety

      @@mayurthakur29 Awesome :)

    • @SuperLano98
      @SuperLano98 Před 4 lety

      @@mayurthakur29 Hi ! Im having the same problem that you describe. How exactly do you resolve? Thank You !

    • @mayurthakur29
      @mayurthakur29 Před 4 lety

      @@SuperLano98 Run your juypter from master node of k8s cluster

  • @lovudude
    @lovudude Před 4 lety

    @Holden, can yo help with my understanding on deploy mode as client - Lets say im running Jupyter notebook in my laptop , would running in deploy mode as client and specifying a driver pod name create a driver pod and keep it alive once the job is complete- so i can keep submitting more jobs ( interactive spark shell like experience) .. when i tried doing this got an exception - ERROR SparkContext: Error initializing SparkContext.
    org.apache.spark.SparkException: No pod was found named Some(spark-pi-driver) in the cluster in the namespace spark1 (this was supposed to be the driver pod.)

    • @lovudude
      @lovudude Před 4 lety

      Saw some documentation on in-cluster-client mode .. is that feature related to this? i tried setting deploy mode to this in-cluster-client and it didnt work expects either client or cluster

    • @HoldenKarau
      @HoldenKarau  Před 3 lety +1

      I know it's been a long time, I just did a blog post around this - scalingpythonml.com/2020/12/21/running-a-spark-jupyter-notebooks-in-client-mode-inside-of-a-kubernetes-cluster-on-arm.html :)