How to run Ollama on Docker

Sdílet
Vložit
  • čas přidán 25. 02. 2024
  • Ollama runs great on Docker, but there are just a couple things to keep in mind. This covers them all.
    Visit hub.docker.com/r/ollama/ollama for more details.
    Be sure to sign up to my monthly newsletter at technovangelist.com/newsletter
    And if interested in supporting me, sign up for my patreon at / technovangelist
  • Věda a technologie

Komentáře • 130

  • @technovangelist
    @technovangelist  Před 3 měsíci +2

    someone just commented about finding another way to upgrade the container. I can't find the comment now, so if this was you, post again. But no, do not upgrade the install inside a container, that’s a whole lot of work for no benefit. The models are stored in the volume you mounted as part of the install, so deleting the image will not affect the models. If you have gone against the recommendations and stored models inside the container, then best approach is to move them to the correct spot and update the container.

  • @Kimomaru
    @Kimomaru Před měsícem +11

    I really wish more videos were made like this. No nonsense, gets straight to the point, clear, concise. Thank you.

    • @technovangelist
      @technovangelist  Před měsícem +2

      And yet some complain that I take too long and waste time. But thankyou so much for the comment. I do appreciate it.

    • @jwerty
      @jwerty Před 6 dny

      @@technovangelist Amazing video ! Finally, I understand the docker.

  • @EcomGraduates
    @EcomGraduates Před 3 měsíci +1

    How you speak in your videos is refreshing thank you 🙏🏻

  • @mercadolibreventas
    @mercadolibreventas Před 3 měsíci +4

    Matt your a great teacher, no one explains things like you do. They just read the command in one sentence and do not explain the actual function of that command in parts. Lots of videos showing how to do something and 75% never work. So thanks so much!

  • @ashwah
    @ashwah Před měsícem +2

    Thanks Matt this helped me understand the Docker side of things. Namely keeping the models in a volume. I will restructure my project based on this. Keep it up ❤

  • @Makumazaan
    @Makumazaan Před měsícem +1

    much respect for the way you deliver information

  • @ToddWBucy-lf8yz
    @ToddWBucy-lf8yz Před 28 dny +1

    thank you Sir! You just took the mystery of how to set this up right. I love me some docker. It really helps to keep the work stuff separated from the personal project stuff.

  • @xXWillyxWonkaXx
    @xXWillyxWonkaXx Před 3 měsíci

    Straight to the point, no fluff, very informative. Very updated. You just earned a fan/subscriber. Howdy Matt 🎩

    • @technovangelist
      @technovangelist  Před 3 měsíci

      there are some who say I am all fluff, but I try to always be closer to your observation.

  • @tristanbob
    @tristanbob Před 3 měsíci

    This is my new favorite channel! I learned like 10 things just in this video.
    I love learning about AI, modern tools such as docker and tailscale, and modern hosting platforms and services. Thank you!

    • @technovangelist
      @technovangelist  Před 3 měsíci +1

      you left off the most important part.... NERF can be expensed!!

    • @tristanbob
      @tristanbob Před 3 měsíci

      Good point! So I learned 11 things :)@vangelist

  • @Slimpickens45
    @Slimpickens45 Před 3 měsíci +1

    🔥good stuff as always Matt!

  • @sampellino
    @sampellino Před 2 měsíci

    A fantastic, clear instructional. Thank you so much! This helped me a ton.

  • @MohammadhosseinMalekpour
    @MohammadhosseinMalekpour Před měsícem

    Thanks, Matt! It was a straightforward tutorial.

  • @TimothyGraupmann
    @TimothyGraupmann Před 3 měsíci

    Learned that containers can be remote and the alias. Yet another great video! I need to take advantage of that. I have a bunch of RPI security cameras and remote containers might make administration even easier!

  • @robertdolovcak9860
    @robertdolovcak9860 Před 3 měsíci

    Nice and clear tutorial. Thanks! 😀

  • @ErnestOak
    @ErnestOak Před 3 měsíci +6

    Does it make sense to use ollama in production as a server?

  • @user-ok9vj5js7e
    @user-ok9vj5js7e Před 7 dny

    thanks for your help!

  • @tiredofeverythingnew
    @tiredofeverythingnew Před 3 měsíci +1

    In the realm of ones and zeros and LLM models, Matt is the undisputed sovereign.

  • @mohammedibrahim-hd2rs
    @mohammedibrahim-hd2rs Před měsícem

    you're amazing bro

  • @fuba44
    @fuba44 Před 3 měsíci +1

    This is my new favorite content, the way you explain it just beams directly into my brain and i get it right away. Thank you. Is there a way to show support, donations or similar?

    • @technovangelist
      @technovangelist  Před 3 měsíci +1

      Folks have asked me about that. I’ll be looking into something like Patreon soon.

    • @technovangelist
      @technovangelist  Před 3 měsíci +1

      The big thing for now is to just share the video with everyone you know.

    • @technovangelist
      @technovangelist  Před 2 měsíci

      Well I do have that patreon now. Just set it up: patreon.com/technovangelist

  • @bjaburg
    @bjaburg Před 3 měsíci

    There are not many people that can explain these steps in such an easy and entertaining way as you do Matt. I often pride myself in being able to do so, but you can be my teacher. I often find myself watching the progress-bar beause I don't want it to end (seriously :-))!
    A request: could you do an explainer video on how to train a model (say Microsoft/Phi-2) on your own dataset and deploy the trained model? OpenAI makes it super easy by deploying a JSONL file and after a while it 'returns' the trained model. But I want to train my own models.
    I have been looking around YT but get lost in parameters, incorrect JSONL-files (or csv)., etc. Surely, this must be easier. (hopefull your answer is "it is easier, and don't call me Shirley")
    Thanks so much again. You have a happy subscriber (and many more to come)>
    Kind regards,
    Bas

  • @95jack44
    @95jack44 Před 3 měsíci

    Searching for a full airgap install on docker to use on Kubernetes. This is a start ^^. Thx

  • @sushicommander
    @sushicommander Před 3 měsíci

    Great video. Now i'm curious about how you setup ollama on brev ... What is your recommended setup & host service for using Ollama as an endpoint?

  • @Lemure_Noah
    @Lemure_Noah Před 3 měsíci

    Excellent, Matt!
    For some reason, I had to run docker commands with "sudo" , to use my GPUs.

    • @gokudomatic
      @gokudomatic Před 3 měsíci

      That sounds like your user is not in the right group. I had once issues like that, and it was a matter of not being in docker group. Now, I can use my gpu in my docker container.

    • @technovangelist
      @technovangelist  Před 3 měsíci

      good answer. I knew it, but couldn't remember. and this is what I remember.

  • @JM-sn5eb
    @JM-sn5eb Před měsícem

    This is exactly what I've been looking for!
    Could you please tell (or maybe create a video) how to use ollama completeley offline? I have a PC that I can not connect to the internet.

  • @chandup
    @chandup Před 3 měsíci

    Nice video. Could you also please make a demo video on how to use ollama via nix (nix shell or on nixos)?

  • @devgoneweird
    @devgoneweird Před 3 měsíci

    Does it make possible to limit resource consumption of ollama?
    I'm looking for some way to run a background computation and I don't really care about how much time it takes (if it is able to process a stream's avg load), but it would be annoying if it would be hanging the main activity on the machine.

  • @brentfergs
    @brentfergs Před 2 měsíci

    Great video as always Matt, I love them. I would like to know how to load a custom model in docker with a model file. Thank you so much.

    • @technovangelist
      @technovangelist  Před 2 měsíci

      Same way as without docker. you create the model using the modelfile, then run it. or am i missing something

  • @AnkitK-wi3wk
    @AnkitK-wi3wk Před měsícem +1

    Hi Matt, your videos are super useful and right on point. Thank you putting this together.
    I have a quick ques on this topic. I have created a RAG streamlit app in python using Ollama llama3 and ChormaDB. The app runs fine on my Mac localhost but I wanted to create a docker image of this app. I am unable to figure out how to include Ollama llama3 in my docker image. Can you help point to any resources which can guide me on this or cover this in one of the topics?
    Again,thanks a mil for the content. Great stuff!!! Cheers

  • @CaptZenPetabyte
    @CaptZenPetabyte Před 24 dny

    Ive been running a lot via Docker but when I found out about the difficulty of GPU pass-through (on any machine) I have been swapping things over to proxmox which does have a GPU pass-through *and* can also use CPU to emulate GPU as it is needed ... what do you think about running on Proxmox?

  • @MarvinBo
    @MarvinBo Před 3 měsíci

    Make you Ollama even better by installing Open WebUI in a second container. This even runs on my Raspi5!

    • @technovangelist
      @technovangelist  Před 3 měsíci

      Some like the webui. But that’s a personal thing. Its an alternative.

  • @ricardofernandez2286
    @ricardofernandez2286 Před 2 měsíci

    Hi Matt, thank you for such a clear an concise explanation!!
    I have a question that may or may not apply in this context, and I'll let you be the jugde of it.
    I'm running on CPU on an 8 virtual core server with 30Gb RAM and NVme disk on ubuntu 22.04, and the performance is kind of poor (and I clearly understand that GPU will be the straightforward way to solve this).
    But I've noticed that when I run the models, for example Mistral 7b, ollama only uses about half the CPUs available and less than 1 Gb of RAM. I'm not sure why it is not using all the resources available, or if using them will improve the performace. Anyway it would be great to have your advice on this, and if it is something that can be improved/configured how would you suggest to do it?
    Thank you very much!!!

    • @technovangelist
      @technovangelist  Před 2 měsíci

      You will need a GPU. Maybe a faster CPU would help, but the GPU is going to be the easier approach. You will see 1 or 2 orders of magnitude improvement adding even a pretty cheap GPU from nvidia or amd.

    • @ricardofernandez2286
      @ricardofernandez2286 Před 2 měsíci

      @@technovangelist Thank you! I know the GPU is the natural way to go.
      I was just wondering why it is using less that half the resources available, when it has plenty of extra CPU and RAM; and if using these idle resources could improve at least in a x% the performance.
      And unfortunately I can't add GPU to this current configuration I have. My CPUs are AMD EPYC 7282 16-Core Processor which I think are quite nice CPUs.
      Thank you!!

  • @xDiKsDe
    @xDiKsDe Před 3 měsíci

    hey matt, appreciate your content - has been very helpful to get everything running so far! I am on a windows 11 pc and managed to get ollama + anythingllm running on docker and communicate w/ each other. Now I want to try to get llms from hugging face to run in the dockerized ollama. I saw how it works, if you have ollama installed directly on the system. But how do I approach this with using docker?
    Thanks in advance and keep it up 👏

    • @technovangelist
      @technovangelist  Před 3 měsíci

      Is the model not already in the library. You can import but I can be a bit of extra work. Check out the import doc in the docs folder of the ollama repository

    • @xDiKsDe
      @xDiKsDe Před 3 měsíci

      ah yes they are, but I meant custom trained llms - I stumbled across the open_llm_leaderboard and wanted to give those a try - will check out the import doc, thanks!
      @@technovangelist

  • @nagasaivishnu9680
    @nagasaivishnu9680 Před 3 měsíci

    Running the docker container as ROOT user is not secure,is there anyway to run it as non root user

  • @mrRobot-qm3pq
    @mrRobot-qm3pq Před 3 měsíci

    Does it consume less resources and run better with OrbStack instead of with Docker Desktop?

  • @RupertoCamarena
    @RupertoCamarena Před 3 měsíci

    did you hear about jan ai? Would be got a Tutorial for docker. Thanks

  • @s.b.605
    @s.b.605 Před měsícem

    how do you swap models in the same container? I think I'm doing it wrong and it's affecting my container memory

  • @kevyyar
    @kevyyar Před 3 měsíci

    Juat dound this channel. Coould you make a video tutorial on how to use inside vscode for code competions?

  • @Tarun_Mamidi
    @Tarun_Mamidi Před 3 měsíci

    Cool tutorial. Can you also show how we can integrate ollama docker with other programs, say, langchain script inside docker. How to connect both of them together or separately?
    Thanks!

    • @technovangelist
      @technovangelist  Před 3 měsíci

      would love to see a good example of using langchain. often folks use it for rag where it only adds complexity. Do you have a good usecase?

  • @jonascale
    @jonascale Před 22 dny

    So, I think you cleared up most of the problems that i have been having trying to get this setup. But i have one last one that i just cant seem to get past. So, my setup is on proxmox and i first tried to create a lxc container then once i have my NVidia passthrough working for my gpu i installed ollama and downloaded my first model. That all went fine then i tried to see if the api was listening on port 11434 by opening a browser and going to the address:11434. according to the documentation i should get a message that ollama is ready. unfortunately, i get no errors the page simply doesn't open. So i approached it from the other side and just created a lxc and installed docker and portainer on it. much to my surprise when i navigated to the address i got the message ollama is ready. My questions is why? im sure this is something easy that im missing but 24 hours later i am still not sure why. any ideas?

  • @michaelberg7201
    @michaelberg7201 Před 3 měsíci

    I recently had the opportunity to try Ollama in docker and it worked pretty much as shown in this video. I do think it would be nice if it was somehow possible to start a container and have it ready to serve a model immediately but i couldn't find an easy way to do this. You basically have to run one docker command to start Ollama, then wait a bit, then run another docker "it" command to tell Ollama to load whatever model you happen to need. How do i achieve the same thing using just one single docker command?

  • @alibahrami6810
    @alibahrami6810 Před 3 měsíci

    is it possible to manage multiple instances of ollama on docker for scaling the ollamas for production? how ?

    • @technovangelist
      @technovangelist  Před 3 měsíci

      You could but it will result in lower performance for everyone.

  • @vishalnagda7
    @vishalnagda7 Před 2 měsíci

    Could you kindly assist me in clarifying how to specify the model name when running the ollama Docker command?
    For instance, I aim to utilize the mistral and llama2:13b models in my project.
    Thus, I request our dev-ops team to launch an ollama container configured with these specific models.

  • @Lemure_Noah
    @Lemure_Noah Před 3 měsíci

    I would like to suggest the ollamas support to embeddings, when it becomes available through REST API. If they really choosed the nomic-ai/nomic-embed-text-v1.5-GGUF, it would be perfect as this model is multi-language

    • @technovangelist
      @technovangelist  Před 3 měsíci

      It does support embeddings. Using Nomic-embed-text. Check out the previous video. It covers that topic.

  • @csepartha
    @csepartha Před 3 měsíci

    Kindly make a tutorial to fine tune an open source LLM model on many pdfs data. The fine tuned LLM must be able to answer the questions from the pdfs accurately.

  • @michaeldoyle4222
    @michaeldoyle4222 Před 2 měsíci

    Any idea where I can see docker logs for local install (i.e. not docker) on mac....

    • @technovangelist
      @technovangelist  Před 2 měsíci

      If it’s a local install that isn’t docker there is no docker log

  • @SharunKumar
    @SharunKumar Před 3 měsíci

    For Windows, the recommended way would be to use WSL(2), since that's a container in itself

    • @technovangelist
      @technovangelist  Před 3 měsíci

      Well recommended way on windows is the native install. But after that is docker. And wsl is a vm, not a container. Ubuntu on wsl is a container that runs inside the wsl vm.

  • @AlokSaboo
    @AlokSaboo Před 3 měsíci

    Loved the video…can you do something similar for LocalAI. Thanks!

    • @technovangelist
      @technovangelist  Před 3 měsíci

      Hmm. Never heard of it before now. I’ll take a look

    • @AlokSaboo
      @AlokSaboo Před 3 měsíci

      @@technovangelist github.com/mudler/LocalAI - Similar to Ollama in many respects. One more tool for you to learn :)

  • @kiranpallayil8650
    @kiranpallayil8650 Před 29 dny

    would ollama still work on a machine with no graphics card?

    • @technovangelist
      @technovangelist  Před 29 dny

      Absolutely. It will just be 1-2 orders of magnitude slower. The work models do requires a lot of math that gpu really help accelerate

  • @lancemarchetti8673
    @lancemarchetti8673 Před 3 měsíci

    Hey Guys.... Mistral just launched their new model named Large!

  • @AdrienSales
    @AdrienSales Před 3 měsíci

    Hi, would you also share podman commands ? Did you give it a try ?

    • @technovangelist
      @technovangelist  Před 3 měsíci +1

      I tried it a bit when Bret fisher and I had them on the show we hosted together but it didn’t have much reason to stop using docker. I didn’t see any benefit.

    • @AdrienSales
      @AdrienSales Před 3 měsíci

      @@technovangelistThanks for the feedback. It was not about dropping docker, but rather be sure both work as in some cases, podman is used (beacause of the rootless mode eg.) and not docker. So it may help some of us spreading ollama even in theses cases in enterprise context ;-p

    • @technovangelist
      @technovangelist  Před 3 měsíci

      I thought they were supposed to be command line compatible. should be the same, right? Try it and let us know.

  • @user-kg1di9ed3z
    @user-kg1di9ed3z Před 6 dny

    Can I run Ollama without having access to internet? I would like to run it locally, with no internet connection at all.

  • @kaaveh
    @kaaveh Před 3 měsíci

    I wish there was a clean way to launch an Ollama docker container with a preconfigured set of models so it would serve and then immediately pull the models. We are overriding the image’s entry point right now to run a script shell that does this…

  • @arberstudio
    @arberstudio Před 3 měsíci

    some of the model links r broken so I had to add it to requirements and edit the Dockerfile

    • @technovangelist
      @technovangelist  Před 3 měsíci

      What do you mean by that? Is this something I’m a file you made?

    • @arberstudio
      @arberstudio Před 3 měsíci

      @@technovangelist I was referring to the Ollama Webui, perhaps this isn't the same repo?

    • @technovangelist
      @technovangelist  Před 3 měsíci +1

      different product made by unrelated folks

  • @akshaypachbudhe3319
    @akshaypachbudhe3319 Před 3 měsíci

    how to connect this ollama server with a streamlit app and run both on docker

    • @madhusudhanreddy9157
      @madhusudhanreddy9157 Před 8 dny

      From the questions I understood you should be two containers with different ports
      1. Ollama
      2. Streamlit app
      Run separately and access apis of Ollama in front end app

  • @Vinn.V
    @Vinn.V Před 24 dny

    It's better to write a docker file and package it as docker image

  • @CC-zr6fp
    @CC-zr6fp Před 3 dny

    I am wondering where I am going wrong. I followed step-by-step but when I run ollama run llama3 I get 'ollama: command not found'

    • @technovangelist
      @technovangelist  Před 3 dny

      Are you running that in the container? If not you need to run the right docker command. But best if you don’t use docker if you aren’t a docker user

    • @CC-zr6fp
      @CC-zr6fp Před 3 dny

      @@technovangelist Thank you for the quick reply. I was running it in one, however I decided it was going to be faster for me to use a spare NUC I had and using Windows. The LXC I was testing it in, even though I gave it 8 cores it was still sorta slow even using llama3:text

    • @technovangelist
      @technovangelist  Před 3 dny

      Without a gpu it will be slow

  • @bodyguardik
    @bodyguardik Před 3 měsíci

    In wsl2 docker version DONT PUT MODELS OUTSIDE WSL2 on mounted windows drive - I/O performance will be x15 times slower.

    • @technovangelist
      @technovangelist  Před 3 měsíci

      Yup. Pretty standard stuff for docker and virtualization. Docker on wsl with Ubuntu means the ollama container is running in the Ubuntu container on the wsl virtual machine. Each level of abstraction slows things down. And translation between levels is going to be slow.

  • @ravitejarao6201
    @ravitejarao6201 Před 3 měsíci

    Hi bro.
    When try to deploy ollama on awa lambda with ecr docker image I am getting error can you please help me
    Error:http:connecterror:[errno 111] connection refused
    Thank you

    • @technovangelist
      @technovangelist  Před 3 měsíci

      need a lot more info. where do you see that. when in the process? Is running a container like that even going to be possible? Do you have access to a gpu with lambda? if not, its going to be an expensive way to go.

  • @florentflote
    @florentflote Před 3 měsíci

  • @bobuputheeckal2693
    @bobuputheeckal2693 Před měsícem

    How to run as a dockerfile

    • @technovangelist
      @technovangelist  Před měsícem

      Yes that’s what this video shows

    • @bobuputheeckal2693
      @bobuputheeckal2693 Před měsícem

      @@technovangelist
      I mean, how to run as a dockerfile, not as a set of docker commands.

    • @technovangelist
      @technovangelist  Před měsícem

      Docker commands that runs an image using the dockerfile

  • @basilbrush7878
    @basilbrush7878 Před 3 měsíci

    Mac not allowing GPU pass through is a huge limitation

    • @technovangelist
      @technovangelist  Před 3 měsíci +1

      Docker has known the issue for a long time. But mostly its because there aren't linux drivers for the apple silicon gpu

  • @95jack44
    @95jack44 Před 3 měsíci

    If anyone has insights on a particular LLM model that has low halucination rate on Kubernetes native resource generation, please leave me a comment ;-). Thx

    • @technovangelist
      @technovangelist  Před 3 měsíci

      Usually when someone has a hard time with the output of a model it points to a bad prompt rather than a bad model.

    • @jbo8540
      @jbo8540 Před 3 měsíci

      Mistral:Instruct is a solid choice for a range of tasks

  • @kwokallsafe5642
    @kwokallsafe5642 Před 15 dny

    VID SUGGESTION ~ (Resolve Error Response: Invalid Volume Specification) - Thanks
    test@xz97:~$ docker run -d --gpus=all -v /home/test/models/:root/.ollama -p 11434:11434 --name ollama ollama/ollama
    docker: Error response from daemon: invalid volume specification: '/home/test/_models/:root/.ollama': invalid mount config for type "bind": invalid mount path: 'root/.ollama' mount path must be absolute.

    • @technovangelist
      @technovangelist  Před 12 dny

      that’s more of a support request.... the error message is all you need. You specified a relative path rather than a absolute one. refer to the docs on docker hub for the image

    • @kwokallsafe5642
      @kwokallsafe5642 Před 12 dny

      @@technovangelist - Thanks Matt for your reply - Discovered the is a "/" SLASH missing before root - (Problem solved) Thanks again.

  • @themax2go
    @themax2go Před měsícem

    waaiiiit wait wait a sec... i specifically remember in a vid (don't remember which, it's been months) that on Mac in order for ollama to utilize "metal" 3d acceleration that it needs to run in docker... strange 🫤

    • @technovangelist
      @technovangelist  Před měsícem

      Sorry. You must have remembered that wrong. Docker on Mac with apple silicon has no access to the gpu. And ollama doesn’t work with the gpu on Intel Macs either