Run ANY LLM Using Cloud GPU and TextGen WebUI (aka OobaBooga)

Sdílet
Vložit
  • čas přidán 3. 07. 2024
  • In this video, I'll show you how to use RunPod.io to quickly and inexpensively spin up top-of-the-line GPUs so you can run any large language model. It's super easy, and you can run even the largest models such as Guanaco 65b. This also includes a tutorial on Text Generation WebUI (aka OobaBooga), which is like Automatic1111 but for LLMs. Basically, an open-source interface for your LLM.
    Enjoy :)
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? ✅
    forwardfuture.ai/
    Rent a GPU (MassedCompute) 🚀
    bit.ly/matthew-berman-youtube
    USE CODE "MatthewBerman" for 50% discount
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
    Links:
    Runpod (Affiliate)- bit.ly/3OtbnQx
    Runpod The Bloke Template - runpod.io/gsc?template=qk29nk...
    HuggingFace - www.huggingface.co
    Guanaco Model - huggingface.co/TheBloke/guana...
    TextGen WebUI - github.com/oobabooga/text-gen...
  • Věda a technologie

Komentáře • 173

  • @jeremybristol4374
    @jeremybristol4374 Před rokem +20

    I appreciate that you find and post these but also walk us through the setup. Huge time saver! Thank you!

    • @matthew_berman
      @matthew_berman  Před rokem +2

      My pleasure, Jeremy!

    • @sally60
      @sally60 Před rokem

      @@matthew_berman Could you share how I can get an API to use with sillytavern?

  • @pollywops9242
    @pollywops9242 Před rokem

    I am doing it now the uncensored was the push i needed 😁

  • @wolphiekun
    @wolphiekun Před rokem +7

    Would be amazing with a guide like this specifically for setting up the best model for coding with the largest token context window.... for us plebs who do not have access to Anthropic yet 😁 appreciate your hands-on, get started fast kind of flavor here Matthew!

  • @joelzola5362
    @joelzola5362 Před rokem +4

    I'm surprised you don't have more followers. Keep going!

    • @matthew_berman
      @matthew_berman  Před rokem +3

      Thank you very much. I hope to continue to grow and educate people on AI topics!

  • @dik9091
    @dik9091 Před rokem +1

    thnx man best vid so far for me and my quest to actually get things done ;)

    • @matthew_berman
      @matthew_berman  Před rokem

      You're welcome. What are you going to run on it?

    • @dik9091
      @dik9091 Před rokem

      @@matthew_berman myself

  • @goldhydride
    @goldhydride Před rokem

    it's the conten we deserve😭 everything is to the point, love this. especially i love your videos where you show us recent papers.
    could i ask you a question about what computer characteristics should i have to use gpu cloud successfully? what characteristics of built in cpu and gpu do i need?

  • @contractorwolf
    @contractorwolf Před 10 měsíci +1

    great content Matthew, thx

  • @MrPuschel
    @MrPuschel Před 9 měsíci +6

    TheBloke is not the author of these models, as stated in the model cards, but provides quantized versions of them.

  • @gelbandz
    @gelbandz Před rokem

    Thanks this worked great!

  • @surajthakkar3420
    @surajthakkar3420 Před 9 měsíci +1

    Hey Matthew,
    Great Video! When can we expect a video about training our own LLM?

  • @theresalwaysanotherway3996

    while 65B models are definetly beyond reasonable consumer hardware, in order to run 33B models, all you need is 8gb VRAM and 32GB system RAM. I get ~1.1 tokens a second using an rtx 3070 and R5 3600. Meaning you can run a lot of these SOTA models using just pretty cheap local hardware.
    Also small correction: The Bloke doesn't make those models, he quantizes them to 4/5bit so that we can all run them. It's super cool that he does that, but he doesn't *make* all those models that you've stated there. Eric Hartford and Tim Dettmers are the 2 big model authors at the moment.

    • @NeuroScientician
      @NeuroScientician Před rokem +1

      Any idea what would be the requirement for 65B model? Do I need like full fat A100?

    • @blablabic2024
      @blablabic2024 Před rokem +1

      @@NeuroScientician You would need dual 7900 XTX or dual 4090, each of them has 24 GB and in tandem that gives them 48 GB, enough to run and train the 65B model. A100 is 8,000 $ GPU, that's a second hand car price level... you need also a proper CPU to go with that, that's another 5,000 $ ... that's a total of 20k US$ all combined... If you need that type of fire power, then it's better to rent it.

    • @avg_ape
      @avg_ape Před rokem

      @@blablabic2024 Hi - How did you calculate the above req?

    • @adams546
      @adams546 Před rokem

      Are you using GGML or GPTQ?

    • @begaxo
      @begaxo Před rokem

      Can I run 33B models with a 12GB AMD gpu with 32gb ram? If so how? Id be really thankful

  • @EAAIO
    @EAAIO Před 4 měsíci

    Thanks for your tutorial, save me thousands of dollar, just to try this. now I can test.

  • @Syn_Slater
    @Syn_Slater Před rokem +1

    Handy video, thanks!

  • @avg_ape
    @avg_ape Před rokem

    Thanks for the vid. Great find & insight. Can you make a vid that reviews some of the Bloke's models?

  • @Suro_One
    @Suro_One Před rokem +1

    Cool, thanks!

  • @Anarchy-Is-Liberty
    @Anarchy-Is-Liberty Před 7 měsíci

    But we need a tutorial on training!! That's what a lot of us need! I want to build my own models for my own business, so I need to figure out how to train the AI to have full understanding and data of what I'm doing. Is there any videos you can point me to so I can start learning how to train this AI?

  • @Uterr
    @Uterr Před rokem +2

    You should add annotation that when setting up pod you should override persistent storage, because runpod sets persistent storage to 100Gb and it would eat up you budget very fast.

  • @ktolias
    @ktolias Před rokem

    Amazing job! Thanks for sharing. I tried to train the same model on Runpod, but I had some difficult time. Can you please make a fine-tune video? Much appreciated!

  • @Imran-Alii
    @Imran-Alii Před rokem

    @Loved it... I appreciate your work!!!!

  • @jwesley235
    @jwesley235 Před rokem +5

    FWIW Ada is not pronounced "Ay-Dee-Ay;" it's "Ayda," as in Ada Lovelace, acclaimed as the first programmer.

  • @SirajFlorida
    @SirajFlorida Před rokem +2

    Heads up, you can click the copy icon to the right of label so that way you get a pretty paste.

    • @matthew_berman
      @matthew_berman  Před rokem

      Yea...thanks. I tried that but got a weird output the first time. Then I tried it again and it was perfect. I'll be doing that going forward!

  • @DarenZammit
    @DarenZammit Před rokem +1

    Thanks for your videos, really informative! HuggingFace URL in description goes to the wrong one :P

    • @matthew_berman
      @matthew_berman  Před rokem

      Lol are you sure I didn't really want to send you to an emoji website? (jk, fixed, thank you)

  • @DanRegalia
    @DanRegalia Před 8 měsíci

    Hey Matthew... Thanks so much for these videos.. I've been binging on them at work and home in my spare time. It's my goal to be running a small local server for the house soon, with a P40 I picked up on Ebay. I saw this, and I wanted to know if you have any videos that show how to setup and run these locally (via oobabooga) and take advantage of setting up the parameters... I'm also curious if it's possible to have multiple models available to use. Thanks again for all this. Digging the MemGPT and Autogen videos you've done. Just amazing.

  • @michael_gaio
    @michael_gaio Před 7 měsíci

    that’s awesome

  • @autophile525i
    @autophile525i Před rokem +4

    Would you use this for only prototyping, or could they be left running reliably to be the hardware in a paid service?

  • @HampusAhlgren
    @HampusAhlgren Před 7 měsíci +1

    Quick note: the block isn't actually the author for the models he just converts existing models to support llama.cpp

  • @rayankhan12
    @rayankhan12 Před rokem +1

    Nice!! I always wanted to know how to run open source LLMs on cloud services like AWS, Azure and GCP.. but they're so complicated... I've started a GCP course on YT too but it's still difficult to learn

    • @matthew_berman
      @matthew_berman  Před rokem +1

      Runpod is suuuuper easy. Enjoy :)

    • @zion9142
      @zion9142 Před 11 měsíci

      But you have to terminate your work. There's no point in training if it will be deleted afterward.

  • @jgrayhi
    @jgrayhi Před 8 měsíci

    Thanks!

  • @mingyukang6592
    @mingyukang6592 Před rokem +5

    Does it cost money to train, and then turn off the GPU and then use it again? And is it impossible to download the trained model to a local machine?

  • @djryanashton
    @djryanashton Před rokem

    Your videos are very good. One thing I needed to do to get it to run was to edit the pod and increase the volume space.

  • @kitrunner6596
    @kitrunner6596 Před 2 měsíci

    Your tech tutorials are bar nun the best. clear concise with exact trouble shoot fixes. can you give a tutorial on how to us vs code with run pod through ssh. each time the server is to connect its asking for a password. I've gone through there trouble shoot but nothing is working.

  • @cyraq_0x248
    @cyraq_0x248 Před rokem +1

    Do you need to configure a new pod and download the model every time you want to try a LLM?

  • @urmatallatra
    @urmatallatra Před rokem +1

    perfect

  • @Laberding
    @Laberding Před 3 měsíci

    Great video! Can you explain why there usualy is a disconnect buttun but not in this case? If I terminate, I have to setup the model every time again?

  • @SzczepanBentyn
    @SzczepanBentyn Před rokem +1

    Is it possible do download a trained model to my local machine?

  • @RedShipsofSpainAgain
    @RedShipsofSpainAgain Před rokem +5

    I have a question. If you're working with proprietary data or private data (like PII), and you don't want to risk sending that data over the internet to Podman or OpenAI or whatever cloud based model, how would you fine tune your data? Is local training on your own local machine the only option?

    • @manoo2056
      @manoo2056 Před rokem +1

      I hope someone or the author answer you. Great question.

    • @Shallowmind
      @Shallowmind Před 5 měsíci

      Yes. Or sign a contract with who can train it for you

  • @simonherd1768
    @simonherd1768 Před rokem

    Thanks

  • @aihome242
    @aihome242 Před 8 měsíci +1

    If the model is trained on that pod, can it be saved or downloaded? if the data gets destroyed what is the point of the training? I see this has been asked here but no clear answer. Thanks!

  • @flimena
    @flimena Před 8 měsíci

    Got this working with AutoGen as expected. Works great, thanks!
    Running in the error " This model maximum context length is 2048 tokens. However, your messages resulted in over 2170 tokens."
    Have parameter max_consecutive_auto_reply=30 and "max_model_tokens": 1200 on the llm_config
    Can't really get anything good out of it other than making it run.
    Suggestions?

  • @Gorto68
    @Gorto68 Před rokem

    When I try following these instructions, I can not move beyond the save settings for model. It doesn't show saved, rather the same error message on loading. Likewise if I try hitting reload. Nothing happens. Can you please suggest what I might be doing wrong? Note: I only have this problem for the model in this video. I had no problem with the Wizard-Vicuna-30B-Uncensored following the instructions in the other video.

  • @k9clubme
    @k9clubme Před rokem +1

    much appreciated for the info. BTW, which model is the closest to GPT4 at the moment?

    • @Utoko
      @Utoko Před rokem +1

      Claude, but I guess you mean open source than Flacon 40B according to the huggingface leaderboard. but if you want locally Wizard-Vicuna 13B is really good (on the 3.5GPT level).

    • @matthew_berman
      @matthew_berman  Před rokem

      I just dropped a new video about Guanaco, which is def the closest to GPT4.

    • @matthew_berman
      @matthew_berman  Před rokem +1

      Stefan - I tested Falcon and it's unusably slow. Hopefully that'll be fixed soon. Right now Guanaco is the best.

    • @k9clubme
      @k9clubme Před rokem

      @@matthew_berman Thank you very much for all your efforts

  • @zeonos
    @zeonos Před 11 měsíci

    Do these providers charge pr use or for just spinning it up and having it idle?

  • @fangornthewise
    @fangornthewise Před 7 měsíci +1

    How do we know if we need an RTX 6000 or if the 4090 is enough?
    Those extra cents of USD do stack up after a while for us in "developing" countries.

  • @JJSleo-bw9fr
    @JJSleo-bw9fr Před 11 měsíci

    Can you show how to create a persistent instance so the data is not destroyed but not being charged either? Is there a way to load it onto a SSD to use later (but not be charged for the GPU only SSD space)?

  • @Rundik
    @Rundik Před rokem

    Can I use them for mining? Not crypto, but vanity addresses etc

  • @hermysstory8333
    @hermysstory8333 Před rokem +1

    It seems like TheBloke's template is missing.

  • @angel1st007
    @angel1st007 Před rokem +1

    @matthew_berman - great job, doing these videos. One question though - once I have the model configured, is there a way, I can use it via API interface from that could GPU instance?

    • @dik9091
      @dik9091 Před rokem

      check the serverless tab besides the gpu tab

    • @angel1st007
      @angel1st007 Před rokem

      @@dik9091 If I go with Serveless instead of GPU cloud, will that allow me to run the model with acceptable performance? The use case is basically to run one of those models via API and use it as OpenAI API alternative. Would that be possible?
      @matthew_berman - if you can make a video on that topic, that would greatly appreciated. Thanks!

    • @dik9091
      @dik9091 Před rokem

      @@angel1st007 from what i understand yes that is exactly the point of serverless. Anyone can make some serious money with this when there are models that outperform gpt4 on a private cloud.

    • @angel1st007
      @angel1st007 Před rokem +1

      @@dik9091 - can you by any chance point me to a guide on how such serverless service with the LLM model can be spun up? I really appreciate any help you can provide.

    • @dik9091
      @dik9091 Před rokem

      @@xlretard send me an invite when you have it setup pls

  • @SantyBalan
    @SantyBalan Před rokem

    Can the web ui be used for multiple users.. like i set it up and create a few logins for other users to try out different models ? Assuming the http server is accessible

  • @moon8013
    @moon8013 Před 8 měsíci

    would like a video of how to train a model using those steps?

  • @NeuroScientician
    @NeuroScientician Před rokem +1

    I am trying to work with 30B models, I would like to have a test inference machine at home. Would 7900XFX/4090 do? How about older enterprise stuff like L40/M40? I am thinking of using RunPod or LambdaLabs for training.

    • @blablabic2024
      @blablabic2024 Před rokem

      Yes, 7900 XTX should suffice, you'll wait a little more than on 4090 for training time but you'll save 1,000 $, 1,000$ that you can spend on top spec Ryzen 9 CPU and nice amount of RAM. You can always (if you have enough funding) get another 7900 XTX and run them in pair in order to run and train 65B model. I'll most probably go the same route.

  • @jamesalexander4411
    @jamesalexander4411 Před 8 měsíci

    I've credited my Runpod account, however, following selecting 'the RTX 6000ADA' and the 'The Bloke Template' Runpod gives me this message "There are no longer any instances available with enough disk space". Does this mean there's no available space for me to run an LLM at this time?

  • @camelCased
    @camelCased Před měsícem

    Wondering if this method would also work with large Llama 3 models? Or is there any better and cheaper version to run them and having an API endpoint? I'm just starting getting familiar with SillyTavern and roleplaying, and that one supports OobaBooga, Ollama and Kobold (which I'm using locally for running small models).
    Also, it would be nice to know how to store the models on Runpod (if that's not too expensive) to avoid waiting while they download every time when I run the pod.

  • @jovialjack
    @jovialjack Před 7 měsíci

    those options arent coming up for me when i choose "prompt" it only says 4 options. i dont know why but ive tried this a few times AND spent money on it... doesn't seem to work :/

  • @RobertAlexanderRM
    @RobertAlexanderRM Před 11 měsíci

    The downloading of meta-llama/Llama-2-13b-hf via runpod's textgen gives a 403 unauthorized error. How to fix ?

  • @nezukovlogs1122
    @nezukovlogs1122 Před 9 měsíci +1

    When you say dollar per hours, does that per hour mean GPU processing time of uptime of GPU server whether its being used or not

    • @d.paradyss8791
      @d.paradyss8791 Před 6 měsíci

      It means uptime of gpu server i think, for me its to expensive to generate worst text than chatgpt and worst images than midjourney

  • @Adamskyization
    @Adamskyization Před rokem

    Can you stop the machine to save all of the configuration but stop the consumption of resources so you don't get charged while pod is stopped?
    So you can later just choose a preconfigured pod and run it?
    Without having to download the model again etc...?

  • @avi7278
    @avi7278 Před rokem +1

    does this install the latest version of textgen web ui? I saw in other videos that runpod has an old version as default. Also, the end wasn't very clear. You have to pay for the machine as if it's running to keep your data?

    • @matthew_berman
      @matthew_berman  Před rokem

      Use TheBloke's version, it's the best implementation with all the right things downloaded already.
      Some machines have a "data only" mode which is much less expensive. Then when you want to use it you spin it back up and pay the regular price with your existing data. But yes, you need to pay to keep your data.

    • @zion9142
      @zion9142 Před 11 měsíci

      Please do a video on this.

  • @A.M.8181
    @A.M.8181 Před rokem

    how download my dataset for learning Lora model in this cloud?

  • @ecorodri26
    @ecorodri26 Před 11 měsíci

    Could some one help me? Does oogabooga webui interface work locally CPU+GPU usage with the gpu instalation?

  • @zilibabwei
    @zilibabwei Před rokem +1

    This is great! Tysm! You're channel is an amazing resource! I fell asleep last night still connected to the RTX 6000 and woke up 7 bucks down! Lol. But it just goes to show you that its really not too expensive to use these resources! It only took me about 5 minutes to download TheBlokes Guanaco65B? Does that seem right? I was expecting much longer.

    • @zilibabwei
      @zilibabwei Před rokem +1

      My main goal is to have a python-coding assistant with me all the time. Something that is great at generating code based on english prompts. (I'm not so worried about it knowing the US presidents! lol! Or anything else, for that matter - I just want a little Ai bot thats obsessed with writing code!) I also want it to remember from one session to the next for longer projects. Does this exist already? Does Anyone know? And if it doesn't, how do I train up maybe like a barebones model to sort of become that?

  • @eyemazed
    @eyemazed Před 8 měsíci

    im confused about the pricing. say i do 2 inferences per hour for a day, is that still 24 hours charged for that day?

    • @mag0b3t0
      @mag0b3t0 Před měsícem

      only while you're using, until you terminate it 7:34

  • @JoseP-cw3je
    @JoseP-cw3je Před rokem +2

    Pretty good video but how do i change the ui to chat mode? I get a error when i try to change it.

    • @matthew_berman
      @matthew_berman  Před rokem

      What error are you getting?

    • @JoseP-cw3je
      @JoseP-cw3je Před rokem

      @matthew_berman thank you for your reply, I get bad gateway after I choose the chat option on the ui. And then I cannot reconnect to the server unless that I restart it.

  • @nikog8326
    @nikog8326 Před rokem

    How do I change the install location when pasting and donwloading an LLM into that 'model' section on RunPod?

    • @nikog8326
      @nikog8326 Před rokem

      It keeps saying no space left on device

  • @PrincessRedine
    @PrincessRedine Před rokem

    I am getting this error when running pygmalion13B Traceback (most recent call last): File “/workspace/text-generation-webui/modules/GPTQ_loader.py”, line 17, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last): File “/workspace/text-generation-webui/server.py”, line 62, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “/workspace/text-generation-webui/modules/models.py”, line 66, in load_model output = load_func_maploader File “/workspace/text-generation-webui/modules/models.py”, line 262, in GPTQ_loader import modules.GPTQ_loader File “/workspace/text-generation-webui/modules/GPTQ_loader.py”, line 21, in sys.exit(-1) SystemExit: -1

  • @PsyGenLab
    @PsyGenLab Před 2 měsíci

    thebloke template is broken atm
    use this instead

  • @MrVbrabble
    @MrVbrabble Před rokem +1

    Is there a way to read and store files on your CPU or designated drive for this method?

    • @matthew_berman
      @matthew_berman  Před rokem +1

      Can you clarify what you're trying to do?

    • @MrVbrabble
      @MrVbrabble Před rokem

      @@matthew_berman Using this setup I would like it to read and write files on my cpu. For example say I have a PDF or Text file, I would like to read the file,understand and summarize it and save it to a txt file. Thank you.

  • @ybwang7124
    @ybwang7124 Před rokem +1

    so is it as good as GPT4.0? the description is confusing

  • @Lucasbrlvk
    @Lucasbrlvk Před rokem +1

    👍😯

  • @puredingo9348
    @puredingo9348 Před rokem

    So does this mean I won't have to put OobaBooga into my PC to run it?

  • @olafge
    @olafge Před rokem +1

    I'd like to store TheBloke's template so that I always have easy access to it. How to do that?

  • @dirklaubscher2369
    @dirklaubscher2369 Před 2 měsíci

    i'm getting an HTTP Service [Port 7860] not ready message. what do i do?

    • @Mirza-sb2gl
      @Mirza-sb2gl Před 2 měsíci

      same, have you figured out how to fix it?

  • @jebathuraijb4374
    @jebathuraijb4374 Před rokem

    Can I load the gptq model in colab

  • @DailyProg
    @DailyProg Před 5 měsíci

    Can you post something like this but for GCP or Azure?

  • @user-fy6de1lq3i
    @user-fy6de1lq3i Před 11 měsíci

    It doesn't let me open HTTP it's making use Jupter lab. I have no idea how to switch please help. I have no option for HTTP, i'm losing money as I type this please help!

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Před rokem +2

    can we fine tune for downstream task using it?

  • @nat.serrano
    @nat.serrano Před 9 měsíci

    and how can I add an api?

  • @christopherchilton-smith6482

    Do one of these for Starcoder :)

  • @kel78v2
    @kel78v2 Před 2 měsíci

    I keep seeing HTTP Service [Port 7860] Not Ready no matter what GPU I choose

  • @oolegdan9813
    @oolegdan9813 Před 8 měsíci +1

    Hi could you please guide me I'm not able to set up "Runpod The Bloke Template" when I hit connect and hit connect to HTTP port I am redirected to a new window and there I see this message "Confirm the character deletion?" please let me know that I am doing wrong.

    • @raducamman
      @raducamman Před 8 měsíci

      same here. But it worked a couple days ago. I think something happened to the template.

    • @raducamman
      @raducamman Před 8 měsíci

      so it seems only the interface is a bit messed up. There are some layers of div on top of it and you can just delete them from the browser, until they fix the issue.

    • @oolegdan9813
      @oolegdan9813 Před 8 měsíci +1

      @@raducamman Thanks It loaded for me today and it works just as intended :)

  • @wsy987
    @wsy987 Před rokem

    I can only load GPTQ, but not GGML, weird.

  • @qbert4325
    @qbert4325 Před 5 měsíci

    Is there any free cloud option

  • @TheStallion1319
    @TheStallion1319 Před měsícem

    What is the benefit of running cloud gpu vs locally , is there any pro for a local gpu

    • @mag0b3t0
      @mag0b3t0 Před měsícem

      not having to wait for a model to download and start running every time, and paying them for their slowness in this aspect (you're in theory renting just a GPU but you're paying for the whole system uptime apparently, regardless of actual GPU usage)

    • @TheStallion1319
      @TheStallion1319 Před měsícem

      @@mag0b3t0 no I meant it in a practical way , is there any technical difference , assuming am using a model in developing an application , would my experience of running it on the cloud be different from running it locally , is there something I wouldn’t be able to do for example or do less efficient

  • @generichuman_
    @generichuman_ Před 8 měsíci +1

    Is there an option to access this via an api endpoint?

    • @paulkiragu8120
      @paulkiragu8120 Před 6 měsíci +2

      Use a service like ollama web UI which you can configure to talk to external llm models

    • @waynehawley814
      @waynehawley814 Před 3 měsíci

      I have a video on my page showing you how to use the Text Generation WebUI with its API extension

  • @meworlds8216
    @meworlds8216 Před rokem

    this template doesnt have jupyter notebook it sucks no?

  • @islamicinterestofficial
    @islamicinterestofficial Před 10 měsíci

    Thanks for the video. Can we use lmsys/vicuna-33b-v1.3 model in it? Or we can only use those models which are associated with TheBroke author???? Please answer that, much appreciated

  • @holdthetruthhostage
    @holdthetruthhostage Před rokem

    My brother i hope it works and affordable

  • @leonwinkel6084
    @leonwinkel6084 Před rokem +2

    Is there a way to call it all via api?

    • @davidnobles162
      @davidnobles162 Před rokem +1

      I have the same question, I'm looking into it right now..

    • @gr8ston
      @gr8ston Před rokem

      @@davidnobles162 did you figure out? I am able to set up the UI template, however i want the same to be accessed via API in my local python code.

    • @davidnobles162
      @davidnobles162 Před rokem

      @@gr8ston I did! super simple to set up. not sure why youtube won't let me post the instructions

    • @davidnobles162
      @davidnobles162 Před rokem +1

      @@gr8ston dnobs/runpod-api

    • @davidnobles162
      @davidnobles162 Před rokem

      somehow youtube is blocking every comment where I explain ANYTHING. Goodluck..

  • @morespinach9832
    @morespinach9832 Před měsícem

    We don’t need GPU if real time high performance is not a must.

  • @yyyzzz-k3r
    @yyyzzz-k3r Před 11 měsíci

    My page only has 1 GPUs, it doesn't contain 4 GPUs

  • @darkbelg
    @darkbelg Před rokem +1

    I feel like you failed to underline that you can get an A100 for 2 dollars an hour!

    • @darkbelg
      @darkbelg Před rokem

      Also TheBloke uses Runpod to make the models.

    • @matthew_berman
      @matthew_berman  Před rokem +2

      Why is that so important? :)

  • @behnamplays
    @behnamplays Před rokem +1

    not affordable tho. If I run it for a day (e.g., 20 hours), I'll be charged the same monthly amount I'm paying for cgpt 🤔

    • @matthew_berman
      @matthew_berman  Před rokem +1

      True. But, you can turn it off/on easily. So you can use 20 hours over the course of a month.

    • @gr8ston
      @gr8ston Před rokem

      use bidding price. i got a gpu worth 1.79$ hour for just 10% of its cost at 0.179$ as a bidding price.

    • @zion9142
      @zion9142 Před 11 měsíci

      ​@@matthew_bermanin your video you said there's a terminate button and all the data is gone. So how do you turn it off?

  • @QEDAGI
    @QEDAGI Před rokem +6

    You're not the first to post an episode using Runpod -- you're the latest. While it's great to use dedicated Runpod servers, why don't any of y'all post using their less expensive community servers?

  • @BarkSaw
    @BarkSaw Před 7 měsíci

    fuck it's paid holy shit if I am trying to make an uncensored GPT I dont want my credit card linked to it

    • @paulkiragu8120
      @paulkiragu8120 Před 6 měsíci

      What do you mean is it paid. How cheap can you get to expect GPU to be given out for free 😮

    • @BarkSaw
      @BarkSaw Před 6 měsíci

      @@paulkiragu8120 Definitely a “😮” moment

  • @IvanRosaT
    @IvanRosaT Před 6 měsíci

    while this is awesome! is there a way to use oobabooga to generate images ?

  • @klammer75
    @klammer75 Před rokem +1

    I was able to get the smaller models working but not this guanaco-65b one….basically a config error that this file doesn’t have a file named config.json….then has a link that goes to a hugging face 404 page🫣🤷🏼‍♂️😔

    • @CodyRiverW
      @CodyRiverW Před rokem +1

      Same

    • @klammer75
      @klammer75 Před rokem

      Can’t even get the smaller models to work now🤷🏼‍♂️🤦🏼‍♂️

  • @JosephConroy
    @JosephConroy Před rokem

    Thanks!

    • @JosephConroy
      @JosephConroy Před rokem

      Hey Matthew - did you ever get a chance to make a video on training models in the RunPod GUI?