Host your own LLM in 5 minutes on runpod, and setup APi endpoint for it.
Vložit
- čas přidán 28. 11. 2023
- Link to google colab used in demo - colab.research.google.com/dri...
Link to where LLM can be hosted - www.runpod.io/
Please note if you are using this for anything other than testing you should restrict access with an APi key.
I couldn't run it the whole day yesterday, your steps by steps approach is wonderful. Thank you.
Great to hear it. Pleased it helped.
God Bless you kind soul ❤
Is this not a bit slow for a 7B model running on a (freakin!) H100 ?
Getting roughly same speed here with a RTX2070 and 5bit quantized 7B models ...
Thanks for the tutorial tho, was gonna look into runpod.
I was really looking for something like this. Thank you so much. Can u make a video on how to use Agentkit by BCG
Great to hear it. Pleased it helped.
I can take a look at Agentkit. Just need to find the time 😆
Hi Thomas, can you provide guidance on how we select the GPU based on the model we would like to test? For example, if I want to test Goliath 120b at reasonable speeds, how do I know which GPUs to deploy? Thanks.
Hi There (sorry don't have your name). tbh the best thing to do for this would be to try and find details around the particular model on hugging face, youtube and twitter to determine if anyone has tested on a particular machine. I'm no expert in this I'm afraid for specific models, and would follow the above process myself to determine the GPU required to run a model.
Thanks. Have you tried using vLLM with the OpenAI API setup?@@TomanswerAi
Which is the most cost effective way to host our LLMs on Runpod, using serverless or using a runpod?
Use case: no production level, just testing different LLMs, even in some autonomous agent networks, which can burn money pretty quickly using gpt-4, so running local LLMs on Runpod some times a day for some hours, should not be always on and instance should not spin up very quickly...
I think serverless is the better for this use case, but I'm not sure, so what is your opinion?
I have a small issue with serverless, Like they have a timeout, after that the gpu can be used by another serverless worker. So depending on the instance you might have to wait. it is really annoying that way. Only workaround is to set minimum workers to one so that a worker is set for a gpu regardless of the fact it is completing a request or not. Still serverless seems to be cost effective.
Hi There, is keeping your data private critical for this testing? There are cheaper ways to test other LLMS such as together.ai but it depends on your requirements.
I keep getting HTTP service not ready for the ports. Is there an additional step required for this?
Unfortunately I'm not aware of what would cause that error. All steps required were in the video at the time of making. Perhaps do a fresh setup and retry. A quick search showed it may be intermittent. Sorry I can't help more.
@@TomanswerAi ive had this too and have tried it fresh and doesnt seem to wanna work at all. idk if contacting their support would help but i feel like i wasted 20 bucks lol
@@brad777luck It appears the setup has changed since this video was made unfortunately. I'll try and get around to checking what changes to the setup are needed
i cant connect to http port 7860. it says its not ready. and also on the logs i am getting this error "AttributeError: module 'gradio.layouts' has no attribute '__all__". can you help please.
have the same issues for OVER two months with no fixes.. it's pretty sad
im getting a 405. I dont think i used the blokes template with API also on thats why I guess.
Hi mate yeh that would be key as it opens a port for API also on that specific template. Did you get it working after using that template?
Yea it worked with the blokes api template on port 5000. Out of interest I don’t see any authenticate bearer token for the api call. How do we make it ready for production with proper auth?
yeh this is an important omission. I didn't actually get the time to investigate how to set this up tbh. It did appear to the be in the blokes readme. I may get a chance soon and report back.
@@TomanswerAi will also have a look. Runpod is apparently production ready so it must have auth