Spark and Machine Learning on Kubernetes in AWS - Hands on webinar
Vložit
- čas přidán 12. 09. 2024
- Link to channel for other videos and to subscribe - / aiengineeringlife
You can find code used in this webinar here - github.com/pra...
In this webinar we will learn how to run Spark on Kubernetes for machine learning workload. Through the webinar, you will learn
Why Spark on Kubernetes?
AWS Kubernetes Services Overview
Hands-on Demo
Facebook Prophet Model on Spark
Dependency Management
Creating EKS Cluster
Building Containers
Running Spark on Kubernetes
Code used in this video can be found here - github.com/pradeep-misra/spark-k8s
Nice Demo
Some questions I have
1) while job is running can we see Spark UI
2) Can we submit using Airflow and get the status of job
3) if we kill the EKS cluster then logs will be stored or it will get erased and if it gets erased then is there anyway to get those logs. e.g Suppose if a job got failed in prod pipeline and I want to check the reason failure and debug. How can I?
4)How to check memory utilisation of the spark job?
4)
Vaibhav.. You can do kubectl port forwarding for Spark UI. I think you can use airflow via sparksubmitoperator. I have not tried it though
Best practice is to have app specific logs stored outside of cluster to debug. You can add volume mount on k8s specifically for logs
Thanks for this session, much needed.
Just wondering if it would be possible to have session 2 which goes in-depth and suggest alternatives of each steps depending on the scale and little more depth on what are the things are happening at the backstage :)
Sachin.. That is going to be a 2 or more hours session then. There is AWS EMR spark k8s launched. Maybe will see if that makes it seamless
Excellent!
Awesome demo👌
Would you please create a Video of deploying the Machine Learning model on spark cluster using PMML files?
@@akshayanand6803 Can you elaborate.. You want to deploy pmml file on Spark?. What kind of model is it Spark ML model stores as PMML or regular python based models?
@@AIEngineeringLife I know it’s late to respond , my humble apologies , this was for python regular models saved as pmml files and then running on test dataset in the spark enterprise data lake environment. But I truly love the way you bring the knowledge 🙏🏻 my gratitude, learning through your videos
Great explanation for start but i thought about using spark operator to have further automation which i am doing now. Hope you can give insights... reason is for using prometheus to monitor job and other benefits for further automation. any thoughts thanks again !