How to run Ollama on Docker
Vložit
- čas přidán 25. 02. 2024
- Ollama runs great on Docker, but there are just a couple things to keep in mind. This covers them all.
Visit hub.docker.com/r/ollama/ollama for more details.
Be sure to sign up to my monthly newsletter at technovangelist.com/newsletter
And if interested in supporting me, sign up for my patreon at / technovangelist - Věda a technologie
someone just commented about finding another way to upgrade the container. I can't find the comment now, so if this was you, post again. But no, do not upgrade the install inside a container, that’s a whole lot of work for no benefit. The models are stored in the volume you mounted as part of the install, so deleting the image will not affect the models. If you have gone against the recommendations and stored models inside the container, then best approach is to move them to the correct spot and update the container.
I really wish more videos were made like this. No nonsense, gets straight to the point, clear, concise. Thank you.
And yet some complain that I take too long and waste time. But thankyou so much for the comment. I do appreciate it.
@@technovangelist Amazing video ! Finally, I understand the docker.
How you speak in your videos is refreshing thank you 🙏🏻
Matt your a great teacher, no one explains things like you do. They just read the command in one sentence and do not explain the actual function of that command in parts. Lots of videos showing how to do something and 75% never work. So thanks so much!
Thanks Matt this helped me understand the Docker side of things. Namely keeping the models in a volume. I will restructure my project based on this. Keep it up ❤
much respect for the way you deliver information
thank you Sir! You just took the mystery of how to set this up right. I love me some docker. It really helps to keep the work stuff separated from the personal project stuff.
Straight to the point, no fluff, very informative. Very updated. You just earned a fan/subscriber. Howdy Matt 🎩
there are some who say I am all fluff, but I try to always be closer to your observation.
This is my new favorite channel! I learned like 10 things just in this video.
I love learning about AI, modern tools such as docker and tailscale, and modern hosting platforms and services. Thank you!
you left off the most important part.... NERF can be expensed!!
Good point! So I learned 11 things :)@vangelist
🔥good stuff as always Matt!
A fantastic, clear instructional. Thank you so much! This helped me a ton.
Thanks, Matt! It was a straightforward tutorial.
Learned that containers can be remote and the alias. Yet another great video! I need to take advantage of that. I have a bunch of RPI security cameras and remote containers might make administration even easier!
Nice and clear tutorial. Thanks! 😀
Does it make sense to use ollama in production as a server?
thanks for your help!
In the realm of ones and zeros and LLM models, Matt is the undisputed sovereign.
wow, you are too kind
you're amazing bro
This is my new favorite content, the way you explain it just beams directly into my brain and i get it right away. Thank you. Is there a way to show support, donations or similar?
Folks have asked me about that. I’ll be looking into something like Patreon soon.
The big thing for now is to just share the video with everyone you know.
Well I do have that patreon now. Just set it up: patreon.com/technovangelist
There are not many people that can explain these steps in such an easy and entertaining way as you do Matt. I often pride myself in being able to do so, but you can be my teacher. I often find myself watching the progress-bar beause I don't want it to end (seriously :-))!
A request: could you do an explainer video on how to train a model (say Microsoft/Phi-2) on your own dataset and deploy the trained model? OpenAI makes it super easy by deploying a JSONL file and after a while it 'returns' the trained model. But I want to train my own models.
I have been looking around YT but get lost in parameters, incorrect JSONL-files (or csv)., etc. Surely, this must be easier. (hopefull your answer is "it is easier, and don't call me Shirley")
Thanks so much again. You have a happy subscriber (and many more to come)>
Kind regards,
Bas
Searching for a full airgap install on docker to use on Kubernetes. This is a start ^^. Thx
Great video. Now i'm curious about how you setup ollama on brev ... What is your recommended setup & host service for using Ollama as an endpoint?
Excellent, Matt!
For some reason, I had to run docker commands with "sudo" , to use my GPUs.
That sounds like your user is not in the right group. I had once issues like that, and it was a matter of not being in docker group. Now, I can use my gpu in my docker container.
good answer. I knew it, but couldn't remember. and this is what I remember.
This is exactly what I've been looking for!
Could you please tell (or maybe create a video) how to use ollama completeley offline? I have a PC that I can not connect to the internet.
Nice video. Could you also please make a demo video on how to use ollama via nix (nix shell or on nixos)?
Does it make possible to limit resource consumption of ollama?
I'm looking for some way to run a background computation and I don't really care about how much time it takes (if it is able to process a stream's avg load), but it would be annoying if it would be hanging the main activity on the machine.
Great video as always Matt, I love them. I would like to know how to load a custom model in docker with a model file. Thank you so much.
Same way as without docker. you create the model using the modelfile, then run it. or am i missing something
Hi Matt, your videos are super useful and right on point. Thank you putting this together.
I have a quick ques on this topic. I have created a RAG streamlit app in python using Ollama llama3 and ChormaDB. The app runs fine on my Mac localhost but I wanted to create a docker image of this app. I am unable to figure out how to include Ollama llama3 in my docker image. Can you help point to any resources which can guide me on this or cover this in one of the topics?
Again,thanks a mil for the content. Great stuff!!! Cheers
Did you find any resources?
Ive been running a lot via Docker but when I found out about the difficulty of GPU pass-through (on any machine) I have been swapping things over to proxmox which does have a GPU pass-through *and* can also use CPU to emulate GPU as it is needed ... what do you think about running on Proxmox?
Make you Ollama even better by installing Open WebUI in a second container. This even runs on my Raspi5!
Some like the webui. But that’s a personal thing. Its an alternative.
Hi Matt, thank you for such a clear an concise explanation!!
I have a question that may or may not apply in this context, and I'll let you be the jugde of it.
I'm running on CPU on an 8 virtual core server with 30Gb RAM and NVme disk on ubuntu 22.04, and the performance is kind of poor (and I clearly understand that GPU will be the straightforward way to solve this).
But I've noticed that when I run the models, for example Mistral 7b, ollama only uses about half the CPUs available and less than 1 Gb of RAM. I'm not sure why it is not using all the resources available, or if using them will improve the performace. Anyway it would be great to have your advice on this, and if it is something that can be improved/configured how would you suggest to do it?
Thank you very much!!!
You will need a GPU. Maybe a faster CPU would help, but the GPU is going to be the easier approach. You will see 1 or 2 orders of magnitude improvement adding even a pretty cheap GPU from nvidia or amd.
@@technovangelist Thank you! I know the GPU is the natural way to go.
I was just wondering why it is using less that half the resources available, when it has plenty of extra CPU and RAM; and if using these idle resources could improve at least in a x% the performance.
And unfortunately I can't add GPU to this current configuration I have. My CPUs are AMD EPYC 7282 16-Core Processor which I think are quite nice CPUs.
Thank you!!
hey matt, appreciate your content - has been very helpful to get everything running so far! I am on a windows 11 pc and managed to get ollama + anythingllm running on docker and communicate w/ each other. Now I want to try to get llms from hugging face to run in the dockerized ollama. I saw how it works, if you have ollama installed directly on the system. But how do I approach this with using docker?
Thanks in advance and keep it up 👏
Is the model not already in the library. You can import but I can be a bit of extra work. Check out the import doc in the docs folder of the ollama repository
ah yes they are, but I meant custom trained llms - I stumbled across the open_llm_leaderboard and wanted to give those a try - will check out the import doc, thanks!
@@technovangelist
Running the docker container as ROOT user is not secure,is there anyway to run it as non root user
Does it consume less resources and run better with OrbStack instead of with Docker Desktop?
did you hear about jan ai? Would be got a Tutorial for docker. Thanks
how do you swap models in the same container? I think I'm doing it wrong and it's affecting my container memory
Juat dound this channel. Coould you make a video tutorial on how to use inside vscode for code competions?
Cool tutorial. Can you also show how we can integrate ollama docker with other programs, say, langchain script inside docker. How to connect both of them together or separately?
Thanks!
would love to see a good example of using langchain. often folks use it for rag where it only adds complexity. Do you have a good usecase?
So, I think you cleared up most of the problems that i have been having trying to get this setup. But i have one last one that i just cant seem to get past. So, my setup is on proxmox and i first tried to create a lxc container then once i have my NVidia passthrough working for my gpu i installed ollama and downloaded my first model. That all went fine then i tried to see if the api was listening on port 11434 by opening a browser and going to the address:11434. according to the documentation i should get a message that ollama is ready. unfortunately, i get no errors the page simply doesn't open. So i approached it from the other side and just created a lxc and installed docker and portainer on it. much to my surprise when i navigated to the address i got the message ollama is ready. My questions is why? im sure this is something easy that im missing but 24 hours later i am still not sure why. any ideas?
I recently had the opportunity to try Ollama in docker and it worked pretty much as shown in this video. I do think it would be nice if it was somehow possible to start a container and have it ready to serve a model immediately but i couldn't find an easy way to do this. You basically have to run one docker command to start Ollama, then wait a bit, then run another docker "it" command to tell Ollama to load whatever model you happen to need. How do i achieve the same thing using just one single docker command?
docker-compose
is it possible to manage multiple instances of ollama on docker for scaling the ollamas for production? how ?
You could but it will result in lower performance for everyone.
Could you kindly assist me in clarifying how to specify the model name when running the ollama Docker command?
For instance, I aim to utilize the mistral and llama2:13b models in my project.
Thus, I request our dev-ops team to launch an ollama container configured with these specific models.
I would like to suggest the ollamas support to embeddings, when it becomes available through REST API. If they really choosed the nomic-ai/nomic-embed-text-v1.5-GGUF, it would be perfect as this model is multi-language
It does support embeddings. Using Nomic-embed-text. Check out the previous video. It covers that topic.
Kindly make a tutorial to fine tune an open source LLM model on many pdfs data. The fine tuned LLM must be able to answer the questions from the pdfs accurately.
Any idea where I can see docker logs for local install (i.e. not docker) on mac....
If it’s a local install that isn’t docker there is no docker log
For Windows, the recommended way would be to use WSL(2), since that's a container in itself
Well recommended way on windows is the native install. But after that is docker. And wsl is a vm, not a container. Ubuntu on wsl is a container that runs inside the wsl vm.
Loved the video…can you do something similar for LocalAI. Thanks!
Hmm. Never heard of it before now. I’ll take a look
@@technovangelist github.com/mudler/LocalAI - Similar to Ollama in many respects. One more tool for you to learn :)
would ollama still work on a machine with no graphics card?
Absolutely. It will just be 1-2 orders of magnitude slower. The work models do requires a lot of math that gpu really help accelerate
Hey Guys.... Mistral just launched their new model named Large!
Hi, would you also share podman commands ? Did you give it a try ?
I tried it a bit when Bret fisher and I had them on the show we hosted together but it didn’t have much reason to stop using docker. I didn’t see any benefit.
@@technovangelistThanks for the feedback. It was not about dropping docker, but rather be sure both work as in some cases, podman is used (beacause of the rootless mode eg.) and not docker. So it may help some of us spreading ollama even in theses cases in enterprise context ;-p
I thought they were supposed to be command line compatible. should be the same, right? Try it and let us know.
Can I run Ollama without having access to internet? I would like to run it locally, with no internet connection at all.
Yes
I wish there was a clean way to launch an Ollama docker container with a preconfigured set of models so it would serve and then immediately pull the models. We are overriding the image’s entry point right now to run a script shell that does this…
some of the model links r broken so I had to add it to requirements and edit the Dockerfile
What do you mean by that? Is this something I’m a file you made?
@@technovangelist I was referring to the Ollama Webui, perhaps this isn't the same repo?
different product made by unrelated folks
how to connect this ollama server with a streamlit app and run both on docker
From the questions I understood you should be two containers with different ports
1. Ollama
2. Streamlit app
Run separately and access apis of Ollama in front end app
It's better to write a docker file and package it as docker image
I am wondering where I am going wrong. I followed step-by-step but when I run ollama run llama3 I get 'ollama: command not found'
Are you running that in the container? If not you need to run the right docker command. But best if you don’t use docker if you aren’t a docker user
@@technovangelist Thank you for the quick reply. I was running it in one, however I decided it was going to be faster for me to use a spare NUC I had and using Windows. The LXC I was testing it in, even though I gave it 8 cores it was still sorta slow even using llama3:text
Without a gpu it will be slow
In wsl2 docker version DONT PUT MODELS OUTSIDE WSL2 on mounted windows drive - I/O performance will be x15 times slower.
Yup. Pretty standard stuff for docker and virtualization. Docker on wsl with Ubuntu means the ollama container is running in the Ubuntu container on the wsl virtual machine. Each level of abstraction slows things down. And translation between levels is going to be slow.
Hi bro.
When try to deploy ollama on awa lambda with ecr docker image I am getting error can you please help me
Error:http:connecterror:[errno 111] connection refused
Thank you
need a lot more info. where do you see that. when in the process? Is running a container like that even going to be possible? Do you have access to a gpu with lambda? if not, its going to be an expensive way to go.
How to run as a dockerfile
Yes that’s what this video shows
@@technovangelist
I mean, how to run as a dockerfile, not as a set of docker commands.
Docker commands that runs an image using the dockerfile
Mac not allowing GPU pass through is a huge limitation
Docker has known the issue for a long time. But mostly its because there aren't linux drivers for the apple silicon gpu
If anyone has insights on a particular LLM model that has low halucination rate on Kubernetes native resource generation, please leave me a comment ;-). Thx
Usually when someone has a hard time with the output of a model it points to a bad prompt rather than a bad model.
Mistral:Instruct is a solid choice for a range of tasks
VID SUGGESTION ~ (Resolve Error Response: Invalid Volume Specification) - Thanks
test@xz97:~$ docker run -d --gpus=all -v /home/test/models/:root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker: Error response from daemon: invalid volume specification: '/home/test/_models/:root/.ollama': invalid mount config for type "bind": invalid mount path: 'root/.ollama' mount path must be absolute.
that’s more of a support request.... the error message is all you need. You specified a relative path rather than a absolute one. refer to the docs on docker hub for the image
@@technovangelist - Thanks Matt for your reply - Discovered the is a "/" SLASH missing before root - (Problem solved) Thanks again.
waaiiiit wait wait a sec... i specifically remember in a vid (don't remember which, it's been months) that on Mac in order for ollama to utilize "metal" 3d acceleration that it needs to run in docker... strange 🫤
Sorry. You must have remembered that wrong. Docker on Mac with apple silicon has no access to the gpu. And ollama doesn’t work with the gpu on Intel Macs either