Deploying machine learning models on Kubernetes
Vložit
- čas přidán 10. 07. 2024
- In this video, we will go through a simple end to end example how to deploy a ML model on Kubernetes. We will use an pretrained Transformer model on the task of masked language modelling (fill-mask) and turn it into a REST API. Then we will containerize our service and finally deploy it on a Kubernetes cluster.
Code from the video:
github.com/jankrepl/mildlyove...
00:00 Intro
00:22 3 step procedure diagram
01:42 Existing framework overview
02:09 Creating an API
09:25 Containerization
13:53 Containerization - custom platform
15:47 Preparing a minikube K8s cluster
17:43 K8s: Deployment and service
21:31 K8s: 2 cool features - self-healing and load balancing
26:00 Outro
If you have any video suggestions or you just wanna chat feel free to join the discord server: / discord
Twitter: / moverfitted
Credits logo animation
Title: Conjungation · Author: Uncle Milk · Source: / unclemilk · License: creativecommons.org/licenses/... · Download (9MB): auboutdufil.com/?id=600 - Věda a technologie
Brooooo this was so good.
Glad you liked it!
Always a pleasure to watch someone as talented as you! Keep it up :)
Wow, much appreciated:) Thanks:)
You're great. Thanks for sharing this in such a nice way.
My pleasure!
Great example. Thanks for the information
My pleasure!
Great video very informative.
Glad you liked it!
great video thanks a lot really liked the explanation !!!.
Glad it was helpful!
Thank you, it helped me a lot .
Happy to hear that!
he is back 🎉
OH !!!!! Glad to meet you again !!!!
Glad you are here:))
Welcome back, we missed you!
Hehe, thank you! Nice to hear that:)
I agree!
Thank you for detail tutorial!
But torchserve now has kubernetes intergration
I will definitely look into it:) Thank you for pointing it out!!
very cool video!
Thank you! Cheers!
Really helpful for foundation on ml ops
Glad to hear that!
Great!
New video 🤩
Realy goood
Great
Would appreciate a video using VScode to include docker contain files, k8s file and Fast API
👏👏👏
Amazing video. In min 5:25 how did you do to open the second bash in the console? I was searching for a long time and I can't find anything. Thanks and regards!
Thank you! You need to install a tool called tmux. One of its features is that you can have multiple panes on a single screen.
@@mildlyoverfitted Thank you! Will dig in it now
Look forward to show your face alot :))
really nice video. Would you see any benefit of using the deployment in a single node with M1 chip? I'd say somehow yes because an inference might not be taking all the CPU of the M1 chip, but how about scaling the model in terms of RAM? one of those models might take 4-7GB of RAM which makes up to 21GB of RAM only for 3 pods. What's you opinion on that?
Glad you liked the video! Honestly, I filmed the video on my M1 using minikube mostly because of convenience. But on real projects I have always worked with K8s clusters that had multiple nodes. So I cannot really advocate for the single node setup other than for learning purposes.
@@mildlyoverfittedgot it. So, very likely more petitions could be resolved at the same time but with a very limited scalability and probably with performance loss. By the way, what are those fancy combos with the terminal? is it tmux?
@@davidpratr interesting:) yes, it is tmux:)
I am having a problem in the min 18:00 the model load is being killed all the time. I tried to "minikube config set memory 4096" but still having the same problem. Any idea? I've been looking for a solution for 3 hours and there is no way
Hm, I haven't had that problem myself. However, yeh, it might be related to the lack of memory.
What terminal application is this, with the different panels?
tmux
Hi, I would like to use GPU to accelerate this demo, can you give me some tips? Thank you
So if you wanna use minikube this seems to be the solution. minikube.sigs.k8s.io/docs/handbook/addons/nvidia/
@@mildlyoverfitted thankyou, i use the "--device" flag of transformers-cli to enable GPU. And I found that serving app takes up almost gpu memory and no compute power. Whatever, thankyou for your video!