How To Install TextGen WebUI - Use ANY MODEL Locally!
Vložit
- čas přidán 18. 06. 2023
- In this video, I show you how to install TextGen WebUI on a Windows machine and get models installed and running. TextGen WebUI is like Automatic1111 for LLMs. Easily run any open source model locally on your computer.
Enjoy :)
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? ✅
forwardfuture.ai/
Rent a GPU (MassedCompute) 🚀
bit.ly/matthew-berman-youtube
USE CODE "MatthewBerman" for 50% discount
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries 📈
bit.ly/44TC45V
Links:
Github Repo - github.com/oobabooga/text-gen...
Install Commands - gist.github.com/mberman84/f09...
The Bloke HF - huggingface.co/TheBloke - Věda a technologie
very nice instructions, would have loved to see a sneak peak at the start of how the UI looks, and a chat / text chat at the end (just to see if it formats nicely and such).
Thanks for all the great content.
Great summary! Text gen UI is build entirely in Gradio. It might be helpful to your viewers to highlight this fact somewhere as well. Gradio can allow you to build robust webUIs like Oobabooga and Automatic1111 and that too using your friendly Python.
History will look back on your videos as either an amazing service to humanity or the beginning of the end. Either way you do great work and we all appreciate your effort. Keep em coming!
Well, I sure do hope my videos are viewed positively. :P
Absolutely wonderful. Thank you so much! I had some problems getting this to work, but this video (and googling some error codes) helped me out tremendously. This is the best guide on how to get this running. Thank you!
Awesome! Glad to hear.
2:57 I am genuinely so much grateful to you for this GPU setup. I was to run TheBloke's Mistral 7B Instruct on Kaggle but stuck for 3 days straight. I was loosing my entire mind. It was super slow and I posted here and there, no answer.
Last evening I was skeptically watching your videos (I do like your videos but this time I was quite mentally flushed because my model wasn't working).
Then I followed the GPU setup and guess what it ran!!
Thank you so much for posting these mistakes and their solutions. This saved my life.
We need more trainers like you. God bless you.😊
Only tutorial that has worked for me. Props to you, sir
Thanks for the video. I couldnt install it with the one-click installer but was able to easily follow your guide
You are really a legend, I never saw a CZcams channel and want to make similar content to spread knowledge
Excellent! Thank You Matthew. Please keep up the good work and I am looking forward on how to upload my own documents to Web-UI and ask questions about it (Like PrivateGPT).
I would love to be able to do that too !
Thank you so much! Easy to follow instructions as well! Great video! I subscribed, not sure why I wasn't before but I am glad I am now!
savier,god(got your video just in 1 min of publish,1st like,comment) i just started following just 1 week back, subed,liked ,shared. Godamm, CZcams you saved me my College, Project and even now my job, startup
Awesome content, id love to see some content about how you think/go about changing the arguments and settings for a custom experience since a lot of us aren't such a wiz yet with advanced CLI manipulations.
thank you you are the only one who explained a way to do it which wasnt a one click installer they made and supprisingy it works on my CPU thanks again.
Thanks brother for the clear cut information ❤
Thanks for the detailed install instruction. Python installations are always so incredibly difficult with all the virtual environment craziness that I often don't even bother to trust the instructions the software gives with its own documentation. There's always something they leave out and it doesn't work. Without verification from a 3rd party I assume anything that uses python is an exercise in frustration. That's why I'm subbed to this channel.
As someone who learned python since the age of 13, I actually agree with you that virtual environments often make things more difficult than they should be. The entire "benefit" of a virtual environment is that it's restrictive - it makes your python code unable to reach files and libraries located at the normal computer. I guess it has great use when distributing. However, I personally have most often skipped virtual environments unless I know for a fact that I have libraries installed that will be in conflict with the required packages of the repository, or if I am really worried that the python code may be a virus. Reason being, I don't want to install the same library 10 times for 10 different repositories. I want to install the library once and then be able to use all codes dependant on it.
I guess the reason why all tutorials tell you to use virtual environment is to guarantee that the installation you perform is 100% reproducible. If you install a code without a virtual environment, you may run into special cases that apply specifically to your computer because of your setup. However, if you are good in python, then you will always be able to work around those bugs.
couldn't agree more. This is why I stopped using python for my own scripting. Why it's so widely used and popular with all these environment issues is still beyond me! UGH!
Thanks for this video, couldn't get it working before following this tutorial!
Thank you for the information, I was struggling with the python error, thanks for all your posts.
Incredible video. Thank you very much for the step by step instalation
Amazing content! Thanks for sharing such outstanding content! Also, plz, keep it up! PS. It would be awesome to have some glances on this subject for Linux
Excellent. Thanks for running through the little errors I am tired of looking at curated vids "look how easy". Thankfully, this time I didn't run into the issues that you demonstrated (perhaps they've fixed some things in their install scripts - I didn't use one click). The only thing that would've been nice is to have a quick hello world type example at the end (based on the TheBloke/vicuna-13b-v1.3.0-GPTQ we downloaded as part of the tute). Just so we can see something working.
Michael Malice meets Sam Harris. I mean that endearingly. I appreciate you taking the time to make this video as straight forward as it is.
I think you should mention that the reason GPTQ runs better on your machine is because you have a good graphics card. For those with a weak GPU you should use GGML as long as your CPU is decent. That also means you're not going to use cuda.
I am just kinda happy the one click installer worked flawless for me 😅 i might work in IT but there are boundaries. Thanks for the vid
You can add conda to your environment variables on Windows systems to use conda in all terminals. I personally use the cmd prompt in VS Code.
A+ content, as always!
Nice explanation. A video on finetunning would be great, one on how to do it and another one on when you should fine tune instead of just prompting. Thanks
Yep! The main issue is I don't have a dataset I can use as an example, so I would just be using the same fine-tuning dataset as another model.
This is going to be the case for most people. So to address your point, 99% of the time people just need to use prompting, not fine-tuning. But it's fun to know how though!
Your content has been very educational. Thank you. I would appreciate activating subtitles
Hmm...you're the second person to say that. I thought they were automatically on...
Very useful stuff. Thank you
Finally, a tutorial that works. Thanks a lot
Great video thank you. Interested to see the training video
Hi Matthew, I try to read your command line in 2.07 min, and when I pause the video the youtube down bar covered the text.
you are my hero, and doing a great job.
Thank you very much for stepping thru this. It is somewhat disheartening to see that even the "pro" has to go thru several steps of "fix this error" before things will actually work! Nevertheless, you are helping so many people by figuring all of this out!
you could also add the -y flag to the code and you wouldn't need to hit enter for yes at all. Nice videos by the way
Yep, had to figure that all out myself a few weeks ago. It was a ton of work to troubleshoot all of those issues. Definitely a non-trivial "one click install".
Agreed. Did you run into the same issues as I did?
Thanks for sharing 🎉
Looking forward to the fine tune video using web-UI
Good tutorial, but I think you forgot one thing, Visual Studio Build Tools. C++ dependencies I believe are required to run oobabooga. I couldn't get oobabooga to install properly till I installed the build tools, and since then it works well. The ggml models are what I run, cause something about 8gb vram not being enough to run other kinds of models. I have had 8gb of vram for a decade, eventually Nvidia will break and will grudgingly and slowly begin to add vram to their cards.
How you went through all that in less than 10 minutes I don't know - it's taken me hours 🤣🤣But I'm finally there. Thank you 🙏🙏🙏
Hi Matthew, love this content! I was able to set up my own thanks to your guidance. I'm not sure how to use superbooga to upload and reference documents like with privateGPT - would you be able to show a tutorial on using textgenWebUI to upload and reference local documents?
Love the videos. They've got me looking into taking courses for a career change. I do have one question for now. What is the shortened way to get back on track if say, your computer turns off before I finished loading the model? I know some stuff is installed already, so I'm not sure what all I need to do to get it back up and running without redoing unnecessary steps?
Great content again ! Will I be able to add my own files to train like the PrivateGPT stuff ?
Really like to see how you train with TextGen. This is by far the best UI in the opensource worldl
Damm good video.
Thank you!
wow actually just did this one hour ago. the one click button actually worked for me. But it was still nice to see
The Windows zip worked for me as well. I had to run it twice though, as AMD was supported, so i had to run it second time and choose None. Kind of silly, but it worked.
Seems like others are having luck with the one-click installer. Glad it is working!
Awesome, thanks
Again a very helpful video. What about a LLM running local and having access to a vector storage (with my own data)? And/or being trained on my own data? I’d love to see a tutorial on this
I will second that. Would be huge.
I will third that. Im currently trying to figure out if there is a way to use this in a way so that i don't have to go through all the steps just to create an api to use something else.
I think I needed to take coding but not with your content. This is pretty amazing. You could probably sell courses on most of your content on how to work these models. Very valuable stuff Matt. Thank you.
also because of the various bracket names, [ ] is quickly becoming called a "killbox"
ie the default y is in a killbox ([y]/n)
This is great and I enjoyed watching the entire walkthrough, but it just convinced me I'm not willing to do that much work (do not have a strong programing background). Without the specs to run 30b or greater models, what's the upside to this install method compared to finding the best fit in the GPT4All ecosystem, which is a fraction of the effort to install or run (and doesn't need me to upgrade my video card)? Give us some demos of powerful tools that need this to work and the value of this walkthrough becomes much greater.
the install works w/o all those issues now
Sweet!!
This is cool, but it would be really cool to understand what models would work well on a Mac GPU that doesn't have CUDA.
I can't seem to get past the error at 3:34, I've searched the web and tried redoing everything but i can't create the "wheel". Any suggestions?
I had to install Microsoft Visual Studio and make sure C++ was in the package. That fixed everything for me.
thank god i found this video
Great instructions. I'm not sure where I went wrong but none of the models, in any mode, work to any extent of the definition of working for me. Best I got was in 'chat mode' one of the models would reply in chat with nothing but Blank messages. Other models just dump out the prompt I enter as the response. Tried 3 different models, might as well have not downloaded the models at all. Running on windows using the 'Conda' setup, CPU only. I was hoping to play around with some models gpt4all doesn't support, not looking good on that front. It's literally doing nothing, I enter prompts and there's no cpu/memory usage increase, no reply.
I saw a pip command running standalone. In one of your previous videos I learned you want to run pip through python so it uses the Conda environment.
Thanks
Woohoo! My comment (about the default) was useful! :)
Thanks for that!!
still getting the " ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects"
even with the fix suggested, any help would be appreciated.
I've just come across this error. I'm starting to research a solution. Has anyone figured out a solution to this? I'm running this Win64, Windows 10.
I would love to find out how to have the text gen webui setup as an LLM server and use API calls for inference and have langchain integrate it like the openAI API calls! I've tried but no luck just yet!
Videos with off-sync audio make me cry. Thanks though! Helped!
First of all, great video. i could have saved hours of researching if I had found this video earlier.
I still have a question though, how do I make API calls to this model loaded on GPU. Does WebUI has a library similar to FastAPI?
For the below errors:
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
Make sure to install Microsoft C++ Build Tools components (+CLI to work)
THANKS, was struggling with the recent versions and the error not getting fixed
Thanks mate. This did the trick. I wish they would put the prerequisites in github.
A tutorial for creating a Lora with a text file would helps a lot, thank you anyway. :)
Hey, can you remember to enable close captions on videos? I can't get the transcription Chat GPT off the YT.
Aren't they enabled by default? Am I missing something?
the
set "CMAKE_ARGS=-DLLAMA_OPENBLAS=on"
set "FORCE_CMAKE=1"
pip install llama-cpp-python --no-cache-dir
fix did not work for me!!
Same here still trying to figure out a fix It said to download visual studio 2022 which I did but its's still not working
Same. I've followed several guides online, but I keep getting the failed building wheel. I have installed Visual Studio, I selected the correct packages. But it keeps saying it can't find visual studio 2019. There's no option to download 19, and from what I've been finding, 2022 shoudl work too. I gave up and went back to the one-click installer.
Matthew thanks for the videos. But this video was very confusing if I may say so.
I have obaboonga installed and running fine but I saw a lot of new and interesting things in this video (gpu). If it's not too much to ask, do a "obaboonga for dummies lol! Another request I can't run "pytorch...bin" models I'm studying Python to help me but I'm starting. Thank you and may God bless you. (Marcos, chemistry teacher, Brazil)
I already installed it with the 1 click setup. It worked in the end. But i think i do it again since it is not really good. Only the latest models kinda gave a short response. But it stops after about 300 words or so
I'm waiting for training video. ❤
👀
I get the „Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0“ Error when trying to run a model. The model loaded but the ai is answering my question with nothing.
Great instructions! But failed in the end for me. Can you address the error " cannot import name 'prepare_model_for_kbit_training' from 'peft' (C:\Users\markt\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\_exit_init__.py)" Thanks.
How does is work with coding? Does it output solid coding snippets as well, formatted or not? 😊
What GPU do you own? I am shocked by how powerful yours is? also, I can't seem to get cuda to be avalible to pytorch
Hi Matthew, thanks for the very instructive video. I am not clear whether we put the largest file of the model in the 'models' folder of 'text_generation_webui_main' or all the files of the model. I also get this error: OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory models/TheBloke_Llama-2-7b-Chat-GGUF.
Thank you so much for not using the 1 click installer - BECAUSE ITS NOT THERE! And every video wants people to us it despite it not being there. Infuriating. THANK YOU SO MUCH!
I am Getting this error AssertionError: Torch not compiled with CUDA enabled while loading GPTQ model in text generation UI . I am using mac m1. pls help
Hello Matthew, thanks, I am getting this error: "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True" Where should be added?
Hey matt, struggling to install the llama-cpp-python module on my mac. getting the error note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python. any ideas?
Have followed all above steps before
i keep getting "git" is not recognized as an internal external command error when pasting the clone url after git clone im running in the anaconda prompt idk where im missing it
Did you ever make that training tutorial? It sounds like a really stupid problem.But I cannot figure out how to upload Data from my google sheets
If anyone still has a problem with building wheels for llama-cpp-python, despite doing what Matthew has shown, you have to download/update Visual Studio 2022
Working on confirming this getting non supported wheel error currently
did you solve this issue?
2 questions please
May a local LLM access your hard disk to modify or create local files ?
May a local LLM grab the internet to get recent files ?
Thx
When the gui started at the top I only see Chat, Default, Notebook, Parameters, Model, Training, Session. Why does it look different?
For people that are here for wizardlm v1.2 by chance: GGUF is replacing GPTQ and GPTQ is becoming old (time runs fast). I could not run the GPTQ but could run the GGUF easily.
will it cause any problem later on if we install this all thing on Winodws system? I don't know how to install it on wsl by the way
Thanks a ton! I only have 1 question, once I shut down my PC and turn it on next day, how do I use the model again?
Just spin up the server again Python server.py and everything will still be there
Does this work with ChatGPT or GPT-4 with API?
Like privateGPT can we add documents here?
Still failing to install llama-cpp-python after the set commands with:
CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!
*** CMake configuration failed
[end of output]
Difference to your installation:
Downloading llama_cpp_python-0.2.11.tar.gz
instead of:
llama_cpp_python-0.1.64.tar.gz
Any fix ideas?
Kind of confused I got up to executing the 'python server.py' command but ran into an error that there's no directory to it.
Matt ui is great for test by how we can use it thru api?
Hey does anyone know if there has been a change with the GPU acceleration. I have the same error Matthew has in the video but the solution doesn't seem to work for me and I cannot find a reference to GPU acceleration in the oficia repo. Thanks for the help
I have an error when i try to use a model (in this case it TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ), this is the error that appears under the download button:
Traceback (most recent call last): File “C:\Users\Qrka\text-generation-webui\modules\GPTQ_loader.py”, line 17, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File “C:\Users\Qrka\text-generation-webui\server.py”, line 67, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “C:\Users\Qrka\text-generation-webui\modules\models.py”, line 74, in load_model output = load_func_maploader File “C:\Users\Qrka\text-generation-webui\modules\models.py”, line 270, in GPTQ_loader import modules.GPTQ_loader File “C:\Users\Qrka\text-generation-webui\modules\GPTQ_loader.py”, line 21, in sys.exit(-1) SystemExit: -1
How is it possible to connect TextGen WebUI frm a local host to a remote host that has a GPU instaleld and a model saved there?
I am totally stuck at 3:47. I have entered in the same command you did
set "CMAKE_ARGS=-DLLAMA_OPENBLAS=on"
set "FORCE_CMAKE=1"
pip install llama-cpp-python --no-cache-dir
I even copy pasted your steps commands file and I still get "ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects"
I'm having the same problem. I'm using windows 10
I also had that problem. I copied the error to Chat GPT Code Interpreter and ChatGPT found out that Microsoft Visual Studio was missing.
@@petero864 I really hoped that would fix this, but not for me.. still stuck
@@jannekallio5047 hey did you by any chance solve the issue? Facing the same problem but no solution. Cant get it to run on GPU.
hey I am facing many problems in running modules in this do you have different ways to run
Does anyone know how this should perform compared to the GPT4ALL client running ggml on cpu? I only tried the new ggml format on WebUI so it should be the least demanding and perform comparably but its not even close. Everything seems to work as it should and a 13b will generate text at around the speed of a 33b on the client
Im stuck in the error failed building wheel for llama. already tried your solution but suddenly it doesn't work. please help
i have 2gb vram, 16gb regular ram and an i7 from 2013. i just cant seam to load any models be it on my GPU or CPU. it just tells me " RuntimeError: [enforce fail at …\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 35389440 bytes." no matter what model i try to load :/
I am still getting the error when I install llama-cpp-python --no-cache-dir, even after I changed the commands. Is there any other way to install the llama-cpp-python --no-cache-dir?