Running Mistral AI on your machine with Ollama

Sdílet
Vložit
  • čas přidán 6. 07. 2024
  • In this video, we'll delve into Mistral AI's latest groundbreaking language model and explore its capabilities using Ollama, a tool designed for running LLMs right on your local machine. Dive deep with me as we go through the process of downloading the model, executing commands, performing sentiment analysis, and extracting entities.
    #MistralAI #ollama #llms #generativeai #largelanguagemodels #llm #llamaindex
    Timings ⏰
    00:00 - Mistral AI's New Model
    00:15 - Ollama
    00:25 - Browsing Ollama Models
    00:53 - Running Ollama
    01:56 - Answering Factual questions
    02:56 - VAR goes wrong in Premier League
    03:23 - Summarisation
    03:54 - Bullet points
    04:05 - Categorisation
    04:27 - Ollama HTTP API
    05:06 - Using Llama Index
    06:07 - Final thoughts on Mistral and Ollama
    Resources 🛠️
    Blog - www.markhneedham.com/blog/202...
    Ollama - ollama.ai/
    Mistral AI - mistral.ai/
    BBC article about VAR - www.bbc.co.uk/sport/football/...
  • Věda a technologie

Komentáře • 28

  • @AdrienSales
    @AdrienSales Před 9 měsíci +5

    Very cool and efficient tutorial. Thanks again a lot for all your contents and the time we can save to learn efficient skills.

  • @ziohalex
    @ziohalex Před 3 měsíci

    Thanks for the great video!

  • @PabloArellanoJr
    @PabloArellanoJr Před 9 měsíci +2

    More videos like this please!!!!!!

  • @sitrakaforler8696
    @sitrakaforler8696 Před 9 měsíci +1

    Niece !

  • @JusticeConder
    @JusticeConder Před 9 měsíci +1

    Great video

  • @oryxchannel
    @oryxchannel Před 9 měsíci

    subbed

  • @AlperYilmaz1
    @AlperYilmaz1 Před 9 měsíci +1

    thanks for the video. it's first time I can run a model locally. Hugging Face route described in other videos failed all the time. And other routes were too cumbersome.
    If you can show us 1) asking questions about documents (LangChain or Llama Index maybe?) and 2) fine-tuning the local models, that would be awesome.

    • @learndatawithmark
      @learndatawithmark  Před 8 měsíci

      First time it properly worked for me. I tried a few of the bigger models on Hugging Face and they were taking like 2 minutes of running without any output!

  • @rayauxey
    @rayauxey Před 7 měsíci

    I think it listed Fifa world cup 2018 because it was the first tournament to use VAR technology

  • @MrMoonsilver
    @MrMoonsilver Před 8 měsíci +2

    My models in WSL 2 seem to be running all on CPU and RAM. Although I have installed the necessary nvidia drivers. Is there a special setting to enable GPU in OLLAMA?

    • @Equilibrier
      @Equilibrier Před 7 měsíci

      I'm interested for the same question.

  • @znelson32
    @znelson32 Před 6 měsíci

    How do you load fine-tuned adapters over top the base Mistral model (they're in safetensor format) ?

    • @learndatawithmark
      @learndatawithmark  Před 6 měsíci

      I haven't tried that. If you have data in safetensor format I think you'd be able to use Hugging Face's transformers library to load the model - you wouldn't need to put it into Ollama.

  • @chrisBruner
    @chrisBruner Před 9 měsíci +1

    OLLama looks interesting, I tried it out but could not figure out how to tell it where I've already downloaded the files. I ended up loading Mistral but now I don't know where it put it. It also installed ollama in the /usr/local/bin directory and now to uninstall it I do what? I like code that I know where it's located and where it's putting it's data.

    • @learndatawithmark
      @learndatawithmark  Před 8 měsíci

      From what I can tell, the models are being downloaded to the ~/.ollama directory:
      $ ls -alh ~/.ollama/models/blobs
      total 133232976
      drwxr-xr-x@ 54 markhneedham staff 1.7K 12 Oct 17:14 .
      drwxr-xr-x@ 4 markhneedham staff 128B 28 Sep 13:39 ..
      -rw-r--r--@ 1 markhneedham staff 160B 12 Oct 15:56 sha256:04f603753dacd8b5f855cdde37290d26ce45b283114fb40c00646c3f063333f4
      -rw-r--r--@ 1 markhneedham staff 307B 28 Sep 14:29 sha256:0740207dce2915a5d9e771e4927d40778088b93d401f38d4e6b028c658e4bfc4
      -rw-r--r--@ 1 markhneedham staff 3.6G 28 Sep 14:11 sha256:135cafba8bf5adf008d4f1d3b80c299fdfdfddf859e22bcd38aadab5f09e5c7a
      -rw-r--r--@ 1 markhneedham staff 3.8G 12 Oct 12:16 sha256:155ebc41bb3029316fd71d42843a5326876ae425b07a4039c15953ecf88baabc
      -rw-r--r--@ 1 markhneedham staff 455B 12 Oct 17:01 sha256:257f3366d87f7a7e8a37a00f90e6d973181100b72dac871a44e662e427fba2cb
      -rw-r--r--@ 1 markhneedham staff 530B 10 Oct 17:25 sha256:29d2ddca2e0def928faa80299680d4ed2a090fa2b092f185f27fcc8de4a15ac7

  • @hectorferronato4861
    @hectorferronato4861 Před 9 měsíci

    What are the specs you used to run? ram gpus cpu cores etc

    • @learndatawithmark
      @learndatawithmark  Před 8 měsíci

      Apple M1 Max:
      Chipset Model: Apple M1 Max
      Type: GPU
      Bus: Built-In
      Total Number of Cores: 32
      Vendor: Apple (0x106b)

  • @user-kn1my2qk2c
    @user-kn1my2qk2c Před 4 měsíci

    Its a Fine tunable or not mark? I need fine tune this model

    • @learndatawithmark
      @learndatawithmark  Před 4 měsíci

      I think so yeah. What would you fine tune it for - I haven't found a reason to try fine tuning one myself yet!

  • @familygifts123
    @familygifts123 Před 8 měsíci

    What are the specs you used to run? ram gpus cpu cores etc.
    My app it's load so slow

    • @learndatawithmark
      @learndatawithmark  Před 8 měsíci

      I'm using a Mac M1:
      Apple M1 Max:
      Chipset Model: Apple M1 Max
      Type: GPU
      Bus: Built-In
      Total Number of Cores: 32
      Vendor: Apple (0x106b)

    • @familygifts123
      @familygifts123 Před 8 měsíci

      oh good job. I'm using window and create docker to testing but run so slow. @@learndatawithmark

    • @learndatawithmark
      @learndatawithmark  Před 8 měsíci

      ​@@familygifts123oh I see. I wonder whether it's not using the host OS GPUs when using docker?

    • @familygifts123
      @familygifts123 Před 8 měsíci

      @@learndatawithmark yes sir, I see docker using CPU only even though my computer has nvidia quadro m1000m also

  • @stanTrX
    @stanTrX Před 3 měsíci

    Which one is better? Llama mistral or gemma or orca?

    • @learndatawithmark
      @learndatawithmark  Před 2 měsíci +1

      I guess the higher parameter Llama models will be better, but as I understand, the fine-tuning that Orca does seems to improve the initial models