Host your own LLM in 5 minutes on runpod, and setup APi endpoint for it.

Sdílet
Vložit
  • čas přidán 28. 11. 2023
  • Link to google colab used in demo - colab.research.google.com/dri...
    Link to where LLM can be hosted - www.runpod.io/
    Please note if you are using this for anything other than testing you should restrict access with an APi key.

Komentáře • 24

  • @kaynkayn9870
    @kaynkayn9870 Před 4 měsíci +1

    I couldn't run it the whole day yesterday, your steps by steps approach is wonderful. Thank you.

    • @TomanswerAi
      @TomanswerAi  Před 4 měsíci

      Great to hear it. Pleased it helped.

  • @hamanaldhekair5623
    @hamanaldhekair5623 Před 4 měsíci +1

    God Bless you kind soul ❤

  • @nemai1337
    @nemai1337 Před měsícem

    Is this not a bit slow for a 7B model running on a (freakin!) H100 ?
    Getting roughly same speed here with a RTX2070 and 5bit quantized 7B models ...
    Thanks for the tutorial tho, was gonna look into runpod.

  • @user-pr6nm2di6d
    @user-pr6nm2di6d Před 4 měsíci +1

    I was really looking for something like this. Thank you so much. Can u make a video on how to use Agentkit by BCG

    • @TomanswerAi
      @TomanswerAi  Před 4 měsíci

      Great to hear it. Pleased it helped.

    • @TomanswerAi
      @TomanswerAi  Před 4 měsíci

      I can take a look at Agentkit. Just need to find the time 😆

  • @user-nx7uh8db2g
    @user-nx7uh8db2g Před 4 měsíci +1

    Hi Thomas, can you provide guidance on how we select the GPU based on the model we would like to test? For example, if I want to test Goliath 120b at reasonable speeds, how do I know which GPUs to deploy? Thanks.

    • @TomanswerAi
      @TomanswerAi  Před 4 měsíci

      Hi There (sorry don't have your name). tbh the best thing to do for this would be to try and find details around the particular model on hugging face, youtube and twitter to determine if anyone has tested on a particular machine. I'm no expert in this I'm afraid for specific models, and would follow the above process myself to determine the GPU required to run a model.

    • @user-nx7uh8db2g
      @user-nx7uh8db2g Před 4 měsíci

      Thanks. Have you tried using vLLM with the OpenAI API setup?@@TomanswerAi

  • @attilavass6935
    @attilavass6935 Před 5 měsíci +1

    Which is the most cost effective way to host our LLMs on Runpod, using serverless or using a runpod?
    Use case: no production level, just testing different LLMs, even in some autonomous agent networks, which can burn money pretty quickly using gpt-4, so running local LLMs on Runpod some times a day for some hours, should not be always on and instance should not spin up very quickly...
    I think serverless is the better for this use case, but I'm not sure, so what is your opinion?

    • @VP-nd9yy
      @VP-nd9yy Před 5 měsíci +1

      I have a small issue with serverless, Like they have a timeout, after that the gpu can be used by another serverless worker. So depending on the instance you might have to wait. it is really annoying that way. Only workaround is to set minimum workers to one so that a worker is set for a gpu regardless of the fact it is completing a request or not. Still serverless seems to be cost effective.

    • @TomanswerAi
      @TomanswerAi  Před 5 měsíci +1

      Hi There, is keeping your data private critical for this testing? There are cheaper ways to test other LLMS such as together.ai but it depends on your requirements.

  • @kel78v2
    @kel78v2 Před 2 měsíci +1

    I keep getting HTTP service not ready for the ports. Is there an additional step required for this?

    • @TomanswerAi
      @TomanswerAi  Před 2 měsíci

      Unfortunately I'm not aware of what would cause that error. All steps required were in the video at the time of making. Perhaps do a fresh setup and retry. A quick search showed it may be intermittent. Sorry I can't help more.

    • @brad777luck
      @brad777luck Před 2 měsíci

      @@TomanswerAi ive had this too and have tried it fresh and doesnt seem to wanna work at all. idk if contacting their support would help but i feel like i wasted 20 bucks lol

    • @TomanswerAi
      @TomanswerAi  Před 2 měsíci

      @@brad777luck It appears the setup has changed since this video was made unfortunately. I'll try and get around to checking what changes to the setup are needed

  • @emiryuce1513
    @emiryuce1513 Před 2 měsíci

    i cant connect to http port 7860. it says its not ready. and also on the logs i am getting this error "AttributeError: module 'gradio.layouts' has no attribute '__all__". can you help please.

    • @jamesalxl3636
      @jamesalxl3636 Před měsícem

      have the same issues for OVER two months with no fixes.. it's pretty sad

  • @user-ob3yg2kn5u
    @user-ob3yg2kn5u Před 7 měsíci +1

    im getting a 405. I dont think i used the blokes template with API also on thats why I guess.

    • @TomanswerAi
      @TomanswerAi  Před 7 měsíci +1

      Hi mate yeh that would be key as it opens a port for API also on that specific template. Did you get it working after using that template?

    • @user-ob3yg2kn5u
      @user-ob3yg2kn5u Před 7 měsíci

      Yea it worked with the blokes api template on port 5000. Out of interest I don’t see any authenticate bearer token for the api call. How do we make it ready for production with proper auth?

    • @TomanswerAi
      @TomanswerAi  Před 7 měsíci

      yeh this is an important omission. I didn't actually get the time to investigate how to set this up tbh. It did appear to the be in the blokes readme. I may get a chance soon and report back.

    • @user-ob3yg2kn5u
      @user-ob3yg2kn5u Před 7 měsíci +1

      @@TomanswerAi will also have a look. Runpod is apparently production ready so it must have auth