Run Your Own LLM Locally: LLaMa, Mistral & More

Run ANY LLM Using Cloud GPU and TextGen WebUI (aka OobaBooga)

Event-Driven Architecture (EDA) vs Request/Response (RR)

Mbappe 200 IQ Moments 🤯

OPAKUJ PO MNĚ 🫵 😂

Full Debate: Biden and Trump in the First 2024 Presidential Debate | WSJ

Host your own LLM in 5 minutes on runpod, and setup APi endpoint for it.

Thomas Hill

zhlédnutí 4 249

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 28. 11. 2023
Link to google colab used in demo - colab.research.google.com/dri...
Link to where LLM can be hosted - www.runpod.io/
Please note if you are using this for anything other than testing you should restrict access with an APi key.

Komentáře • 24

@kaynkayn9870 Před 4 měsíci ⁺¹
I couldn't run it the whole day yesterday, your steps by steps approach is wonderful. Thank you.
@TomanswerAi Před 4 měsíci
Great to hear it. Pleased it helped.
@hamanaldhekair5623 Před 4 měsíci ⁺¹
God Bless you kind soul ❤
@nemai1337 Před měsícem
Is this not a bit slow for a 7B model running on a (freakin!) H100 ?
Getting roughly same speed here with a RTX2070 and 5bit quantized 7B models ...
Thanks for the tutorial tho, was gonna look into runpod.
@user-pr6nm2di6d Před 4 měsíci ⁺¹
I was really looking for something like this. Thank you so much. Can u make a video on how to use Agentkit by BCG
@TomanswerAi Před 4 měsíci
Great to hear it. Pleased it helped.
@TomanswerAi Před 4 měsíci
I can take a look at Agentkit. Just need to find the time 😆
@user-nx7uh8db2g Před 4 měsíci ⁺¹
Hi Thomas, can you provide guidance on how we select the GPU based on the model we would like to test? For example, if I want to test Goliath 120b at reasonable speeds, how do I know which GPUs to deploy? Thanks.
@TomanswerAi Před 4 měsíci
Hi There (sorry don't have your name). tbh the best thing to do for this would be to try and find details around the particular model on hugging face, youtube and twitter to determine if anyone has tested on a particular machine. I'm no expert in this I'm afraid for specific models, and would follow the above process myself to determine the GPU required to run a model.
@user-nx7uh8db2g Před 4 měsíci
Thanks. Have you tried using vLLM with the OpenAI API setup?@@TomanswerAi
@attilavass6935 Před 5 měsíci ⁺¹
Which is the most cost effective way to host our LLMs on Runpod, using serverless or using a runpod?
Use case: no production level, just testing different LLMs, even in some autonomous agent networks, which can burn money pretty quickly using gpt-4, so running local LLMs on Runpod some times a day for some hours, should not be always on and instance should not spin up very quickly...
I think serverless is the better for this use case, but I'm not sure, so what is your opinion?
@VP-nd9yy Před 5 měsíci ⁺¹
I have a small issue with serverless, Like they have a timeout, after that the gpu can be used by another serverless worker. So depending on the instance you might have to wait. it is really annoying that way. Only workaround is to set minimum workers to one so that a worker is set for a gpu regardless of the fact it is completing a request or not. Still serverless seems to be cost effective.
@TomanswerAi Před 5 měsíci ⁺¹
Hi There, is keeping your data private critical for this testing? There are cheaper ways to test other LLMS such as together.ai but it depends on your requirements.
@kel78v2 Před 2 měsíci ⁺¹
I keep getting HTTP service not ready for the ports. Is there an additional step required for this?
@TomanswerAi Před 2 měsíci
Unfortunately I'm not aware of what would cause that error. All steps required were in the video at the time of making. Perhaps do a fresh setup and retry. A quick search showed it may be intermittent. Sorry I can't help more.
@brad777luck Před 2 měsíci
@@TomanswerAi ive had this too and have tried it fresh and doesnt seem to wanna work at all. idk if contacting their support would help but i feel like i wasted 20 bucks lol
@TomanswerAi Před 2 měsíci
@@brad777luck It appears the setup has changed since this video was made unfortunately. I'll try and get around to checking what changes to the setup are needed
@emiryuce1513 Před 2 měsíci
i cant connect to http port 7860. it says its not ready. and also on the logs i am getting this error "AttributeError: module 'gradio.layouts' has no attribute '__all__". can you help please.
@jamesalxl3636 Před měsícem
have the same issues for OVER two months with no fixes.. it's pretty sad
@user-ob3yg2kn5u Před 7 měsíci ⁺¹
im getting a 405. I dont think i used the blokes template with API also on thats why I guess.
@TomanswerAi Před 7 měsíci ⁺¹
Hi mate yeh that would be key as it opens a port for API also on that specific template. Did you get it working after using that template?
@user-ob3yg2kn5u Před 7 měsíci
Yea it worked with the blokes api template on port 5000. Out of interest I don’t see any authenticate bearer token for the api call. How do we make it ready for production with proper auth?
@TomanswerAi Před 7 měsíci
yeh this is an important omission. I didn't actually get the time to investigate how to set this up tbh. It did appear to the be in the blokes readme. I may get a chance soon and report back.
@user-ob3yg2kn5u Před 7 měsíci ⁺¹
@@TomanswerAi will also have a look. Runpod is apparently production ready so it must have auth

Další v pořadí

Automatické přehrávání

Run Your Own LLM Locally: LLaMa, Mistral & More

Run Your Own LLM Locally: LLaMa, Mistral & More

Run ANY LLM Using Cloud GPU and TextGen WebUI (aka OobaBooga)

Run ANY LLM Using Cloud GPU and TextGen WebUI (aka OobaBooga)

Event-Driven Architecture (EDA) vs Request/Response (RR)

Event-Driven Architecture (EDA) vs Request/Response (RR)

Mbappe 200 IQ Moments 🤯

Mbappe 200 IQ Moments 🤯

OPAKUJ PO MNĚ 🫵 😂

OPAKUJ PO MNĚ 🫵 😂

Full Debate: Biden and Trump in the First 2024 Presidential Debate | WSJ

Full Debate: Biden and Trump in the First 2024 Presidential Debate | WSJ

AI: Giganti, horečka a konec světa | KOVY

AI: Giganti, horečka a konec světa | KOVY

I Finally Got a Tesla Cybertruck and It Scares the Crap Out of Me

I Finally Got a Tesla Cybertruck and It Scares the Crap Out of Me

Why I Quit the Scrum Alliance

Why I Quit the Scrum Alliance

API vs. SDK: What's the difference?

API vs. SDK: What's the difference?

Why Donut Media Is Falling Apart: An Explainer

Why Donut Media Is Falling Apart: An Explainer

How Agile failed software developers and why SCRUM is a bad idea

How Agile failed software developers and why SCRUM is a bad idea

Stable Diffusion Run Pod Startup

Stable Diffusion Run Pod Startup

host ALL your AI locally

host ALL your AI locally

Llama/Wizard LM Finetuning with Huggingface on RunPod

Llama/Wizard LM Finetuning with Huggingface on RunPod

How to Run Any LLM using Cloud GPUs and Ollama with Runpod.io

How to Run Any LLM using Cloud GPUs and Ollama with Runpod.io

MICHAJLOV feat. SEPAR - PAC-MAN prod. MARYS (Official Video)

MICHAJLOV feat. SEPAR - PAC-MAN prod. MARYS (Official Video)

Wait for the last one! 👀

Wait for the last one! 👀

Hádej Youtubera! Dodo vs. Barča

Hádej Youtubera! Dodo vs. Barča

🚫➡️🛠️ DIY Asphalt Repair: Transforming Potholes Step-by-Step! 🛣️

🚫➡️🛠️ DIY Asphalt Repair: Transforming Potholes Step-by-Step! 🛣️

LOVE LETTER - POPPY PLAYTIME CHAPTER 3 | GH'S ANIMATION

LOVE LETTER - POPPY PLAYTIME CHAPTER 3 | GH'S ANIMATION

THEY made a RAINBOW M&M 🤩😳 LeoNata family #shorts

THEY made a RAINBOW M&M 🤩😳 LeoNata family #shorts

Kendrick Lamar - Not Like Us

Kendrick Lamar - Not Like Us

Full Debate: Biden and Trump in the First 2024 Presidential Debate | WSJ

Full Debate: Biden and Trump in the First 2024 Presidential Debate | WSJ