Nvidia Cuda, cuDNN, Conda, PyTorch and TensorFlow Installation with Ubuntu 22.04
Vložit
- čas přidán 24. 07. 2024
- This video is all you need to get your Ubuntu 22.04 Deep Learning machine ready with the following:
1. Ubuntu Kernel 5.18 Update
2. Latest Nvidia Display Driver 515.57
3. Cuda Toolkit 11.7
4. cuDNN 8.0 Installation
5. Conda Toolkit 11.7
6. Python 3.9
7. Torch with GPU Support
8. TensorFlow with GPU support
GitHub Resources:
github.com/prodramp/DeepWorks...
▬▬▬▬▬▬ ⏰ TUTORIAL TIME STAMPS ⏰ ▬▬▬▬▬▬
- (00:00) Quick Intro
- (01:32) Ubuntu Kernel 5.18 Update
- (02:30) Nvidia Driver update 515.57
- (03:05) Driver install in Recovery Mode
- (04:40) Cuda Toolkit 11.7 Installation
- (05:24) Tools nvcc, gcc, g++, cmake check
- (06:06) cudNN 8.x instalation
- (09:32) Conda Cuda Toolkit 11.7 Installation
- (10:22) Python 3.9 and Torch test with GPU
- (10:45) TensorFlow Installation with GPU
- (11:15) Final installation validation
Connect
------------------
- Prodramp LLC (@prodramp)
- Website - prodramp.com
- LinkedIn - / prodramp
- GitHub- github.com/prodramp/
- AngelList - angel.co/company/prodramp
- Facebook - / prodramp
Content Creator: Avkash Chauhan (@avkashchauhan)
- / avkashchauhan
- / avkashchauhan
Tags:
#nvidia #ai #deeplearning #cnn #ml #lime #aicloud #h2oai #driverlessai #machinelearning #cloud #mlops #model #collaboration #deeplearning #modelserving #modeldeployment #pytorch #datarobot #datahub #streamlit #modeltesting #codeartifact #dataartifact #modelartifact #onnx #aws #kaggle #mapbox #lightgbm #xgboost #dataengineering #pandas #keras #tensorflow #tensorboard #cnn #prodramp #avkashchauhan #LIME #mli #xai #cuda #cuda-nn - Věda a technologie
Thank you! I wouldn't have even known what questions to ask, but you have enumerated the process quite clearly. Keep up the good work!
Glad it was helpful! Thank you so much for your feedback.
thank you. I have hated how difficult this process has been I hope this video works!
Appreciate your comment. Thanks you so much. It does work as followed by several users.
Hi Prodramp, thanks for the wonderful tutorial.
Appreciate your comment and glad to be an assistance.
I think it is important to quote that in the moment of producing this video the newer kernel was the one that the author is updating to, which is kernel 5.18. Because some new fellows might think that they have to downgrade their kernel to 5.18 when it is not needed.
Thanks for the wonderful tutorial.
Great video. Could you please let us know how to set up such an environment while using ubuntu on a mackboo pro 2013 with intel ?
hi, it is very useful..is it mandatory to install anaconda in the base and cuda toolkit in the new environment(in your video it is in dl39).
I have few questions becuase i want to install cuda tool kit 11.7 and pytorch 1.3.x with cuda 11.7:
1 : can I installed cuda tool kit 11.7 with latest version of nvidia drivers 535 in ubuntu 20?
2: for cuda too kit installation , you have installed cuda toolkit twice , one by downloading from nvidia website and one by running command for conda , is it compulsory to install conda based toolkit as well?
is it always recommended to install the lastest Nvidia drivers? In my case I want to install cudatoolkit 11.3. is there any incompatibility?
Sorry but at the end of the video near ~12:00 the output seems that no GPUs are found! why is that ?
I have problems with this procedure. Already at the very beginning at updating the kernel there is a mistake: instead of .deb it must be *.deb This I finally figured out. When I try to install the NVIDIA driver in the recovery mode, the installation is terminated, because it needs cc, but in my system (i specially prepared a virgin system to do the procedure) cc is not found. This makes it difficult to follow your instructions.
i have 5.19.0-42-generic Kernal ?
I am trying to setup a small station with 2 rtx 3060 GPUs, but not able to. Can you pls guide me.
manual installation for cuda is a bit hard for maintaining i recommend using cuda containers by nvidia using docker once that's configured there is no issue as the gcc issues happens with other packages docker can tackle this problem
how can I do this?
do you have any tutorial I could follow?
@@homerlol9058 You need to use docker to acheive this
Thank you for this - very useful! Just wondering whether you had a solution for jax not finding the GPU?
Appreciate your comment, thank you so much.
Yes, please check this out czcams.com/video/auksaSl8jlM/video.html
Hi Prodramp,
Thanks for your tutorial.
I did as you thought but tensorflow and pytorch are not recognizing GPU. i am able to get GPU with nvidia-smi. Can you please advise?
You have to run a multistep inspection. first only stick with pytorch and check why not GPU detected and then follow for the TF. its hard to give u steps here, sorry.
Hi thank you for the tutorial. I have a question, during the driver install, I had a request for “install sign kernel” and things didn’t work out. I tried to install it but got an error because secure boot is enabled. Should I disable it? And how should I do that?
You can go back to start the kernel at root level and install the driver at root mode to avoid the error. If you trust the driver, should be okay. Its preference at the user level and the need of the driver.
Thanks for the comment, appreciate it.
Thank you, Prodramp for this helpful tutorial. However, from 3:05 to 4:39, I have no idea what you are talking about. I am a machine learning Ph.D. and just bought a PC for my own projects. I just installed ubuntu 22.04, and I am trying to set up the environment. Sorry that I have learned nothing about the 'start mode' , 'recovery mode', or 'user prompt', would you please explain more about the procedures? really appreciate!!
Let me explain you what is going on. When you have display driver installed, you just can not overwrite, installing it, will give u error unless the driver installer has a protection built into to the installer to continue installation after restart. What I have done is, started the Ubuntu machine into the recovery mode. In this mode only linux kernel is loaded with few important drivers i.e. disk, network etc.. At this time the display driver installation is very easy because the display driver is not loaded so there is no error or overwriting it. Every linux installation support both recovery mode as well as normal mode and Recovery mode is used to install drivers or fix various errors which can not be fixed in regular mode. you would need to learn these methods to be an effective Ubuntu user. Hope this clarifies your question(s).
CUDA version in nvidia-smi output does not shows actually installed CUDA toolkit version, but show the latest suitable CUDA version for current driver. To check actual installed CUDA version please use nvcc --version command
Appreciate your feedback. Thanks.
@@650AILab you are welcome
I followed your setps, installed drive in reboot successfully but still getting this error:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Please help
Most of the driver problems are logged in the installation logs so if you please read the log you will get exact reason for your trouble. And if you share the error, I will be happy to give my feedback on the error to solve it.
Thanks for your comment and feedback, sincerely appreciate it.
for MX Linux users installing cuda as deb package:
sudo add-apt-repository contrib
doesn't work out of the box, use instead:
sudo apt-get install software-properties-common
Appreciate your comment, thank you so much for sharing this information, definitely will be useful for someone.
Hey
its a great video able to follow through the whole video and explained very well
small correction in the Ubuntu kernel Update 5.18 section
code to install all the .deb packages is
sudo dpkg -i *.deb
Also after installing the cuda
need to add the path to .bashrc
cd /home/$user/
nano .bashrc
add below
export PATH="/usr/local/cuda-11.7/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH"
now nvcc --version will show up
I am glad, you enjoyed it and found it useful, thanks for the comment.
I thought the CUDA toolkit downloads the Driver automatically?
When I check whether GPU is acessable i have 10 answer that is not and last that GPU is fine. But it works and my network is learning using GPU.
It's all depends on the Python library you are using and if that library has access to GPU. This video covers torch and TensorFlow support with GPU and my latest video shows the jax/jaxlib support with GPU.
Hey there I want to know if I need to update the Kernel in order to get everything working. Because the LTS supports only up to 5.17 and I worry that I break something. Also I wanted to know if I need to update the Kernel, if it makes sense to update it to 5.19 because that is what ubuntu 22.10 now uses
Thanks for the comment, appreciate it also asking the question.
I will not jump to 20.10 unless there is definite need as well as kernel upgrade to 5.19.
As of now 5.18 is very stable kernel with 22.04 and it is LTS, which I am running on my machine so I do not have any need to upgrade both kernel and ubuntu release.
Hi, two questions:
1. Is the 5.18 kernel necessary?
2. When I try to install 5.18 kernel it breaks my machine, probably due to having very new hardware in my rig. Can I install an earlier kernel ( say 5.15 ) and then just keep going with the installation and everything will work fine?
Yes, 5.15 kernel will work exactly the same. I had all working with the 5.15 first and later I upgraded to 5.18 and applied all my steps, there was no issues with both the kernels. All the very best.
@@650AILabThank you very much for the helpful video and your work! But I have 1 question here though. I am afraid that with the system update everything will break due to conflicts. You mentioned that the workstation worked for you with the kernel version 5.15. Then, you upgraded to the 5.18. But after that did you purge CUDA toolkit and cuDNN and reinstall it again?
Hi thanks for the tutorial, i have some questions:
1. how do you decide to make the whole process on kernel 5.18? Will it be the same for 5.19?
2. I have a nvidia gtx 3050 but when looking for the driver, I have two options one with Ti and another without Ti. The "Ti" is for titan?
use the ti version, the one without ti won't work, most likely.
@@lollol-bh5uw thanks. Do you know why is that?
Every time I reboot after nvidia driver installation I get a "oh no something went wrong" screen. I tried to follow your directions, but dpkg of linux-modules won't install because the kernel isn't installed, and the kernel won't install because the linux-modules aren't installed.
Thanks for your comment, appreciate it.
Please start the Ubuntu into the recovery mode with networking first and then install the packages directly from the comment line.
@@650AILab Thanks for your quick reply! Unfortunately I can’t boot into recovery, it also goes to “oh no something went wrong”. I’m beginning to think reinstalling is my only way out.
Hello, at the beggining all installed all but nothin shown. After restart the terminal nvidia-smi showed cuda but nvcc not. I solved that with:
check if it is in your PATH by “whereis nvcc”, if it returns “nvcc:” then you need to add below two lines in “.bashrc”
usually “.bashrc” file path is like “/home/username/.bashrc” then add below two lines (change cuda version with installed version)
export PATH=“/usr/local/cuda-11.4/bin:$PATH”
export LD_LIBRARY_PATH=“/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH”
then save and close the file
check “nvcc --version”
Hope that it helps someone. I used it because NVIDIA-SMI sowed CUDA but NVCC --version not.
Perfect, thanks for sharing your tip. Appreciate your comment.
When anyone uses "whereis" command it actually checks the binary from the path(s) and if it is not available in the path, you will not get it. So if you know you do have the binary, its best to add it into the path to make it accessible by the OS and all other tools.
sorry for asking ... how do you get that information output in the terminal at the left with the ubuntu logo and all the useful information
screenfetch
Which version of pytorch did you install / build and how? That is the whole issue that I'm having with 22.04. The official pytorch releases support cuda 11.3 and 11.6 (My ubuntu has 11.5 and can be updated to 11.7... What are the odds..? ).
I do not use python directly instead use Conda primarily to create python environment. With Conda you can install "conda install -c pytorch pytorch" and this will install the pytorch (pytorch/1.12.0/py3.9_cuda11.3_cudnn8.3.2_0/pytorch) for your conda based python 3.9 environment on Ubuntu 22.04.
@@650AILab Yes, I was (am) using conda environments as well, but still had issues, so I thought that the cuda-version (cuda-toolkit) still needs to match up with the cuda-version of the system install (11.5 in my case). So I tried a bunch of things while failing. But while going down to PyTorch 1.11, I realized that something wasn't quite right with my NVIDIA-packages, I reinstalled those and at least PyTorch 1.11 with a non 11.5-cuda (11.3 I think it was) started working. Maybe PyTorch 1.12 will too, with whatever CUDA-toolkit-versions they are packaged with (assuming that the display drivers / card support that particular version)
@@JWAM Please try with updating Cuda/Conda/Python from scratch as It worked for me. Hope for the positive results .
@@JWAM Did you find a solution?
found the solution: What did the trick for me was to first call conda install -c nvidia/label/cuda-11.7.0 cuda-toolkit and only then install pytorch (without cudatoolkit)
What did the trick for me was to first call *conda install -c nvidia/label/cuda-11.7.0 cuda-toolkit* and only then install pytorch
Glad you got it working.
clearly from the prompt you got from python after downloading tenserflow shows that it isn't supporting gpu
same with torch
I am not sure if I understood you correctly... Thanks for the comments appreciate it.
Hey there, thanks so much for your tutorial! When I try to install the Cuda Package, i get this error: The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo cp /var/cuda-repo-ubuntu2204-11-7-local/cuda-46B62B5F-keyring.gpg /usr/share/keyrings/
When I then try to run the command, nothing happens. Any idea on what it could be? Help would be very appreciated!
When you will run the command there will be no output and after that if you run the next command it will work as expected. I am sure you are doing correctly.