Deploying machine learning models on Kubernetes

Sdílet
Vložit
  • čas přidán 10. 07. 2024
  • In this video, we will go through a simple end to end example how to deploy a ML model on Kubernetes. We will use an pretrained Transformer model on the task of masked language modelling (fill-mask) and turn it into a REST API. Then we will containerize our service and finally deploy it on a Kubernetes cluster.
    Code from the video:
    github.com/jankrepl/mildlyove...
    00:00 Intro
    00:22 3 step procedure diagram
    01:42 Existing framework overview
    02:09 Creating an API
    09:25 Containerization
    13:53 Containerization - custom platform
    15:47 Preparing a minikube K8s cluster
    17:43 K8s: Deployment and service
    21:31 K8s: 2 cool features - self-healing and load balancing
    26:00 Outro
    If you have any video suggestions or you just wanna chat feel free to join the discord server: / discord
    Twitter: / moverfitted
    Credits logo animation
    Title: Conjungation · Author: Uncle Milk · Source: / unclemilk · License: creativecommons.org/licenses/... · Download (9MB): auboutdufil.com/?id=600
  • Věda a technologie

Komentáře • 49

  • @abdjanshvamdjsj
    @abdjanshvamdjsj Před 11 měsíci +8

    Brooooo this was so good.

  • @ludwigstumpp
    @ludwigstumpp Před rokem +7

    Always a pleasure to watch someone as talented as you! Keep it up :)

  • @JoseMiguel_____
    @JoseMiguel_____ Před rokem +1

    You're great. Thanks for sharing this in such a nice way.

  • @mmacasual-
    @mmacasual- Před měsícem +1

    Great example. Thanks for the information

  • @davidyates4857
    @davidyates4857 Před rokem +1

    Great video very informative.

  • @aditya_01
    @aditya_01 Před 5 měsíci +1

    great video thanks a lot really liked the explanation !!!.

  • @maksim3285
    @maksim3285 Před 11 měsíci +1

    Thank you, it helped me a lot .

  • @fizipcfx
    @fizipcfx Před rokem +2

    he is back 🎉

  • @kwang-jebaeg2460
    @kwang-jebaeg2460 Před rokem

    OH !!!!! Glad to meet you again !!!!

  • @vishalgoklani
    @vishalgoklani Před rokem +1

    Welcome back, we missed you!

  • @thinkman2137
    @thinkman2137 Před 7 měsíci +1

    Thank you for detail tutorial!

    • @thinkman2137
      @thinkman2137 Před 7 měsíci

      But torchserve now has kubernetes intergration

    • @mildlyoverfitted
      @mildlyoverfitted  Před 7 měsíci

      I will definitely look into it:) Thank you for pointing it out!!

  • @johanngerberding5956
    @johanngerberding5956 Před rokem

    very cool video!

  • @shivendrasingh9759
    @shivendrasingh9759 Před měsícem +1

    Really helpful for foundation on ml ops

  • @user-cp1pe2tx7h
    @user-cp1pe2tx7h Před rokem

    Great!

  • @lauraennature
    @lauraennature Před rokem

    New video 🤩

  • @nehetnehet8109
    @nehetnehet8109 Před rokem

    Realy goood

  • @nehetnehet8109
    @nehetnehet8109 Před rokem

    Great

  • @user-ds5sh9uj7o
    @user-ds5sh9uj7o Před rokem +3

    Would appreciate a video using VScode to include docker contain files, k8s file and Fast API

  • @evab.7980
    @evab.7980 Před rokem

    👏👏👏

  • @unaibox1350
    @unaibox1350 Před rokem +1

    Amazing video. In min 5:25 how did you do to open the second bash in the console? I was searching for a long time and I can't find anything. Thanks and regards!

    • @mildlyoverfitted
      @mildlyoverfitted  Před rokem +1

      Thank you! You need to install a tool called tmux. One of its features is that you can have multiple panes on a single screen.

    • @unaibox1350
      @unaibox1350 Před rokem +1

      @@mildlyoverfitted Thank you! Will dig in it now

  • @kwang-jebaeg2460
    @kwang-jebaeg2460 Před rokem +1

    Look forward to show your face alot :))

  • @davidpratr
    @davidpratr Před 4 měsíci

    really nice video. Would you see any benefit of using the deployment in a single node with M1 chip? I'd say somehow yes because an inference might not be taking all the CPU of the M1 chip, but how about scaling the model in terms of RAM? one of those models might take 4-7GB of RAM which makes up to 21GB of RAM only for 3 pods. What's you opinion on that?

    • @mildlyoverfitted
      @mildlyoverfitted  Před 4 měsíci +1

      Glad you liked the video! Honestly, I filmed the video on my M1 using minikube mostly because of convenience. But on real projects I have always worked with K8s clusters that had multiple nodes. So I cannot really advocate for the single node setup other than for learning purposes.

    • @davidpratr
      @davidpratr Před 4 měsíci +1

      @@mildlyoverfittedgot it. So, very likely more petitions could be resolved at the same time but with a very limited scalability and probably with performance loss. By the way, what are those fancy combos with the terminal? is it tmux?

    • @mildlyoverfitted
      @mildlyoverfitted  Před 4 měsíci +1

      @@davidpratr interesting:) yes, it is tmux:)

  • @unaibox1350
    @unaibox1350 Před rokem

    I am having a problem in the min 18:00 the model load is being killed all the time. I tried to "minikube config set memory 4096" but still having the same problem. Any idea? I've been looking for a solution for 3 hours and there is no way

    • @mildlyoverfitted
      @mildlyoverfitted  Před rokem

      Hm, I haven't had that problem myself. However, yeh, it might be related to the lack of memory.

  • @alivecoding4995
    @alivecoding4995 Před rokem

    What terminal application is this, with the different panels?

  • @zhijunchen1248
    @zhijunchen1248 Před 11 měsíci +1

    Hi, I would like to use GPU to accelerate this demo, can you give me some tips? Thank you

    • @mildlyoverfitted
      @mildlyoverfitted  Před 11 měsíci

      So if you wanna use minikube this seems to be the solution. minikube.sigs.k8s.io/docs/handbook/addons/nvidia/

    • @zhijunchen1248
      @zhijunchen1248 Před 11 měsíci +1

      @@mildlyoverfitted thankyou, i use the "--device" flag of transformers-cli to enable GPU. And I found that serving app takes up almost gpu memory and no compute power. Whatever, thankyou for your video!