Hi Please could you help, I'm using jupyter notebook and spark driver and executor inside kubernetes but once i try to run .show() command spark driver not execute the job and it stuck and
@Holden , we are tryning to use Spark 2.4 using client mode through jupyter notebook , we are able to use it using spark-submit (cluster mode from server ) . Jupyter notebook is deployed in k8s cluster it self , we are able to use it by creating master / worker static pod . Currently we are facing issue as we run the task , the pod get created and and get destroyed and the task is in hung mode . Below command is used in jupter notbook import os os.getcwd() import pyspark import socket %%time conf = pyspark.conf.SparkConf() - conf.setMaster("k8s://x.x.x.x:6443") \ .set("spark.submit.deployMode",'client') \ .set("spark.kubernetes.namespace","spark-project1") \ .set("spark.driver.hostname",'socket.gethostbyname(socket.gethostname())') \ .set("spark.driver.port","7787") \ .set("spark.kubernetes.container.image","x.x.x.x:5000/spark/spark-py:v1") \ .set("spark.executor.instances","3") \ .set("spark.kubernetes.pyspark.pythonVersion","3") \ .set("spark.kubernetes.authenticate.driver.serviceAccountName","spark") - %%time sc = pyspark.context.SparkContext.getOrCreate(conf=conf) - sc - %%time rdd = sc.parallelize(range(10)) - print(rdd.sumApprox(10)) ----- > getting stuck over here As we observed the difference between spark-submit and client mode , is that driver pod are not initiated , is this the behavior ?
So doing the notebook it in non-client mode I'm not super sure I haven't done that deployment. I'd try asking on the user@ list and if that doesn't go anywhere send me an e-mail personally and I'll explore it some with Ilan.
@Holden, can yo help with my understanding on deploy mode as client - Lets say im running Jupyter notebook in my laptop , would running in deploy mode as client and specifying a driver pod name create a driver pod and keep it alive once the job is complete- so i can keep submitting more jobs ( interactive spark shell like experience) .. when i tried doing this got an exception - ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: No pod was found named Some(spark-pi-driver) in the cluster in the namespace spark1 (this was supposed to be the driver pod.)
Saw some documentation on in-cluster-client mode .. is that feature related to this? i tried setting deploy mode to this in-cluster-client and it didnt work expects either client or cluster
I know it's been a long time, I just did a blog post around this - scalingpythonml.com/2020/12/21/running-a-spark-jupyter-notebooks-in-client-mode-inside-of-a-kubernetes-cluster-on-arm.html :)
Hi Please could you help, I'm using jupyter notebook and spark driver and executor inside kubernetes but once i try to run .show() command spark driver not execute the job and it stuck and
@Holden , we are tryning to use Spark 2.4 using client mode through jupyter notebook , we are able to use it using spark-submit (cluster mode from server ) . Jupyter notebook is deployed in k8s cluster it self , we are able to use it by creating master / worker static pod . Currently we are facing issue as we run the task , the pod get created and and get destroyed and the task is in hung mode .
Below command is used in jupter notbook
import os
os.getcwd()
import pyspark
import socket
%%time
conf = pyspark.conf.SparkConf()
-
conf.setMaster("k8s://x.x.x.x:6443") \
.set("spark.submit.deployMode",'client') \
.set("spark.kubernetes.namespace","spark-project1") \
.set("spark.driver.hostname",'socket.gethostbyname(socket.gethostname())') \
.set("spark.driver.port","7787") \
.set("spark.kubernetes.container.image","x.x.x.x:5000/spark/spark-py:v1") \
.set("spark.executor.instances","3") \
.set("spark.kubernetes.pyspark.pythonVersion","3") \
.set("spark.kubernetes.authenticate.driver.serviceAccountName","spark")
-
%%time
sc = pyspark.context.SparkContext.getOrCreate(conf=conf)
-
sc
-
%%time
rdd = sc.parallelize(range(10))
-
print(rdd.sumApprox(10)) ----- > getting stuck over here
As we observed the difference between spark-submit and client mode , is that driver pod are not initiated , is this the behavior ?
So doing the notebook it in non-client mode I'm not super sure I haven't done that deployment. I'd try asking on the user@ list and if that doesn't go anywhere send me an e-mail personally and I'll explore it some with Ilan.
@@HoldenKarau thanks, I did some troubleshooting , it was issue with reverse routing between juypter and k8s cluster .
It's resolved now
@@mayurthakur29 Awesome :)
@@mayurthakur29 Hi ! Im having the same problem that you describe. How exactly do you resolve? Thank You !
@@SuperLano98 Run your juypter from master node of k8s cluster
@Holden, can yo help with my understanding on deploy mode as client - Lets say im running Jupyter notebook in my laptop , would running in deploy mode as client and specifying a driver pod name create a driver pod and keep it alive once the job is complete- so i can keep submitting more jobs ( interactive spark shell like experience) .. when i tried doing this got an exception - ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: No pod was found named Some(spark-pi-driver) in the cluster in the namespace spark1 (this was supposed to be the driver pod.)
Saw some documentation on in-cluster-client mode .. is that feature related to this? i tried setting deploy mode to this in-cluster-client and it didnt work expects either client or cluster
I know it's been a long time, I just did a blog post around this - scalingpythonml.com/2020/12/21/running-a-spark-jupyter-notebooks-in-client-mode-inside-of-a-kubernetes-cluster-on-arm.html :)