Running Mistral AI on your machine with Ollama
Vložit
- čas přidán 6. 07. 2024
- In this video, we'll delve into Mistral AI's latest groundbreaking language model and explore its capabilities using Ollama, a tool designed for running LLMs right on your local machine. Dive deep with me as we go through the process of downloading the model, executing commands, performing sentiment analysis, and extracting entities.
#MistralAI #ollama #llms #generativeai #largelanguagemodels #llm #llamaindex
Timings ⏰
00:00 - Mistral AI's New Model
00:15 - Ollama
00:25 - Browsing Ollama Models
00:53 - Running Ollama
01:56 - Answering Factual questions
02:56 - VAR goes wrong in Premier League
03:23 - Summarisation
03:54 - Bullet points
04:05 - Categorisation
04:27 - Ollama HTTP API
05:06 - Using Llama Index
06:07 - Final thoughts on Mistral and Ollama
Resources 🛠️
Blog - www.markhneedham.com/blog/202...
Ollama - ollama.ai/
Mistral AI - mistral.ai/
BBC article about VAR - www.bbc.co.uk/sport/football/... - Věda a technologie
Very cool and efficient tutorial. Thanks again a lot for all your contents and the time we can save to learn efficient skills.
No problem - happy to help :)
Thanks for the great video!
More videos like this please!!!!!!
Niece !
Great video
Thanks! Glad you liked it :)
subbed
thanks for the video. it's first time I can run a model locally. Hugging Face route described in other videos failed all the time. And other routes were too cumbersome.
If you can show us 1) asking questions about documents (LangChain or Llama Index maybe?) and 2) fine-tuning the local models, that would be awesome.
First time it properly worked for me. I tried a few of the bigger models on Hugging Face and they were taking like 2 minutes of running without any output!
I think it listed Fifa world cup 2018 because it was the first tournament to use VAR technology
My models in WSL 2 seem to be running all on CPU and RAM. Although I have installed the necessary nvidia drivers. Is there a special setting to enable GPU in OLLAMA?
I'm interested for the same question.
How do you load fine-tuned adapters over top the base Mistral model (they're in safetensor format) ?
I haven't tried that. If you have data in safetensor format I think you'd be able to use Hugging Face's transformers library to load the model - you wouldn't need to put it into Ollama.
OLLama looks interesting, I tried it out but could not figure out how to tell it where I've already downloaded the files. I ended up loading Mistral but now I don't know where it put it. It also installed ollama in the /usr/local/bin directory and now to uninstall it I do what? I like code that I know where it's located and where it's putting it's data.
From what I can tell, the models are being downloaded to the ~/.ollama directory:
$ ls -alh ~/.ollama/models/blobs
total 133232976
drwxr-xr-x@ 54 markhneedham staff 1.7K 12 Oct 17:14 .
drwxr-xr-x@ 4 markhneedham staff 128B 28 Sep 13:39 ..
-rw-r--r--@ 1 markhneedham staff 160B 12 Oct 15:56 sha256:04f603753dacd8b5f855cdde37290d26ce45b283114fb40c00646c3f063333f4
-rw-r--r--@ 1 markhneedham staff 307B 28 Sep 14:29 sha256:0740207dce2915a5d9e771e4927d40778088b93d401f38d4e6b028c658e4bfc4
-rw-r--r--@ 1 markhneedham staff 3.6G 28 Sep 14:11 sha256:135cafba8bf5adf008d4f1d3b80c299fdfdfddf859e22bcd38aadab5f09e5c7a
-rw-r--r--@ 1 markhneedham staff 3.8G 12 Oct 12:16 sha256:155ebc41bb3029316fd71d42843a5326876ae425b07a4039c15953ecf88baabc
-rw-r--r--@ 1 markhneedham staff 455B 12 Oct 17:01 sha256:257f3366d87f7a7e8a37a00f90e6d973181100b72dac871a44e662e427fba2cb
-rw-r--r--@ 1 markhneedham staff 530B 10 Oct 17:25 sha256:29d2ddca2e0def928faa80299680d4ed2a090fa2b092f185f27fcc8de4a15ac7
What are the specs you used to run? ram gpus cpu cores etc
Apple M1 Max:
Chipset Model: Apple M1 Max
Type: GPU
Bus: Built-In
Total Number of Cores: 32
Vendor: Apple (0x106b)
Its a Fine tunable or not mark? I need fine tune this model
I think so yeah. What would you fine tune it for - I haven't found a reason to try fine tuning one myself yet!
What are the specs you used to run? ram gpus cpu cores etc.
My app it's load so slow
I'm using a Mac M1:
Apple M1 Max:
Chipset Model: Apple M1 Max
Type: GPU
Bus: Built-In
Total Number of Cores: 32
Vendor: Apple (0x106b)
oh good job. I'm using window and create docker to testing but run so slow. @@learndatawithmark
@@familygifts123oh I see. I wonder whether it's not using the host OS GPUs when using docker?
@@learndatawithmark yes sir, I see docker using CPU only even though my computer has nvidia quadro m1000m also
Which one is better? Llama mistral or gemma or orca?
I guess the higher parameter Llama models will be better, but as I understand, the fine-tuning that Orca does seems to improve the initial models