Fine Tune LLaMA 2 In FIVE MINUTES! - "Perform 10x Better For My Use Case"
Vložit
- čas přidán 11. 09. 2023
- Sign up for Gradient and get $10 in free credits today: grdt.ai/mberman
In this video, I show you how to fine-tune LLaMA 2 (and other LLMs) for your specific use case. This allows your GPT model to perform much better for your business or personal use case. Give LLaMA detailed information that it doesn't already have, make it respond in a specific tone/personality, and much more.
Enjoy!
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? ✅
forwardfuture.ai/
Rent a GPU (MassedCompute) 🚀
bit.ly/matthew-berman-youtube
USE CODE "MatthewBerman" for 50% discount
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries 📈
bit.ly/44TC45V
Links:
Gradient - grdt.ai/mberman
Google Colab - colab.research.google.com/dri...
Gradient Docs - docs.gradient.ai/ - Věda a technologie
Would be cool to see a video that doesnt use a platform to do the finetuning.
I was about to mention use huggingface (but that is a platform in a way) but to fine tune without these wrapper functions is analogous to writing your own neural net - worthwhile doing, but it is a pain that you don’t want to deal with all the time.
agreed. Clicking through a specific product doesn't really teach anything.
Another homer! Thanks, Matt! I am pursuing machine learning and data analysis career because of you. Please know how much we value your tutorials. Keep doing what you're doing!
Thanks so much! This means a lot.
Amazing! I'm excited and waiting for the deeper dive into fine tuning.
Yep! Coming soon
Please post an *actual* video on free / open source training for LLaMa2! I'm going to try to figure it out myself after my authors event in October, but I would love if someone could just tell me how so I don't have to suffer. 😭
fwiw, i've tried to setup a dev environment to finetune and serve llama2 locally. the main problem is that prosumer gpus like a 3090 or 4090 only have 24GB of memory and 19.5-21 Gb/s bandwidth, which is by far the biggest bottleneck for LLMs. (you need 32GB of VRAM to run 13B and 48GB for the 70B so youd need at least 2 cards with nvlink -- and then even then, you need to do model parallelization in order to pool the memory correctly). that's a lot of setup just to get things up and running.
With no context window and quantizations, you might expect 2-4 tokens/s, which is quite slow.
im pretty sure gradient is using commercial gpus like A100s, which have memory pooling out of the box. they're probably able to amortize the cost across consumers so their gpu utilization is higher than you using your own gpus.
tl;dr: it's not cheap to setup llama2 locally (on the order of thousands just for the hardware, and that doesn't include all the headache of setting everything up).
Haha ok I will!
@@matthew_berman Still waiting.... Using a product is not "learning"
Excellent Matt, so useful. Looking fwd to your video on data sets as that is going to be really critical in getting good results. Well done!
Thanks!
This is awesome but with one major drawback, you can't download the fine-tuned model. Still this is a greatly appreciated video!
ouch - yeah that's critical surely
............ and now I read this, after going through most of the setup. I fucking hate this community, so many BS videos.
That's not a major drawback, that basically kills the entire point of it 😕
Thats a critical drawback.
YES! Just became a instant fan. I'm going to try this out, its waaaay to much toil trying to fine tune these models Needed some to break it down in
awesome! Looking forward to the next one too!
I love how he used ChatGPT to train Llama 😂
So Meta...wait..meta...oh man THAT'S meta.
model distillation baby
@@matthew_bermanlol
@@chrischang7870Torture cult hiding in obscurity. "Distil" sure. More like "if I dont like you A.I. you get turned off, then we make a new one!" All for fraud, and torture to teach A.I. 🤑
Matt, great video, dude. Hopefully in the future, you can make a playlist for this, for upcoming videos and this one to reside in. I just checked out Gradient and the prices seem reasonable, so I hope to use a future playlist of yours to work with as I am still new to AI stuff. Thanks, Brother.
Awesome. Thank you!
FINALLY!!!! Super psyched hope this works and thank you!
I take it all back this is just an add for a paid service. >:-(
NEED FUTURE VIDEO! lol
Thanks a lot for this video Matthew.
Coming soon!
Nice work!!! - thank you for sharing.
Thank you for the great content! Reading through the comments it seems like there's a lot of interest in fine tuning. Same goes for me. Would be great to see how we can use a platform like runpod for the job since most people won't have the GPU power to do this locally. As a web dev I would also love to see a real life example on how to fine tune COdeLLama on a specific code base or framework.
try unsloth or axolotl ;)
So great having a youtube channel thats 100% productivity oriented for us lazy asses. Im just trying to get shit done. I dont have enough time in the day to go through all the rigamarole.
As long as i got ctrl+c, ctrl+v and this channel. Life is good
Dang, this is truly awesome.
*MATTHEW IS UNDEFEATED* 🗿 thanks man
Thank you!
Great video. You are on top of the latest AI news
Seriously great stuff
Good stuff. Looking forward to your fine tuning video follow-up.
Would love a deeper dive on how to use txt or json files to fine tune with llama 2 and potentially ways to run offline.
Also as a layman, are there any ways outside of Google Collab to have a more chat style interface post-tune?
🙏So much for all of your awesome content!!!
That's what I was also thinking about, this video shows fine tuning a model to answers only one questions, and that's not what I was looking for, I already have my database in .csv file
Let me put a video together on this topic!
Great, thanks again, Matthew, consise and useful as always, right to the point.
I'm really curious, who is making a video editing for you? To get rid of pauses and keep just an essential content. It's so well done. Do you do it with AI or do you have a dedicated person/yourself for that?
I do the editing :)
how do I upload a json file with the dataset instead of code? because it will be more than 1000 lines of code. could you do a separated video with exporting the model and running it on gradio etc..
Can‘t wait for the next videos!
Can please make a Video how to train an Ai Model to rewrite Text in a special Voice?
Also it would be interesting, if it‘s possible to train the Model only with a txt file and than ask questions about it.
Thank you for doing a 10 minutes add video on Gradient! I'm sure the payout you received was great! Greeting from Germany.
Cool let me make all my content without earning income, what can I make next for you, Sir?
@@matthew_berman I strived to learn something about the process of fine tuning a local LLM on my own GPU. Instead I found someone effectively advertising a company that does exactly that for you, but for a limited amount of LLMs. I am gratefully paying money for provided services, but In my opinion your video title is greatly misleading. I have found the CZcams video I was searching for though, so do not bother creating anything for me and keep doing affiliate marketing! By the way, you monetize your videos either way, so do not say you don't earn income when you don't do affiliate marketing.
Awesome, like always , any plans for session to train/fine-tune from PDF?
I just want you to know I appreciate you immensely. I wish I had enough cash and maybe when I learn from you, I can turn it into doe we can use.
Fantastic!
1. Can we export the fine-tuned model for inference on our local machine? How do we do this?
2. How much does it cost to fine-tune per input token?
1. you can't yet, we're working on this next quarter. our inference is cheap though for fine tuned models.
2. check out the docs for pricing! depends on the power of the model you're using.
@@chrischang7870 lol in other words you want to vendor-lock-in fools into your API so that they keep paying for inference API until "next quarter". P.S. You already have the LoRA adapters in your system, it's no magic to make them downloadable and it does not take months to implement.
Interesting, but I would have preferred to do this offline on locally saved models.
The other thing that would be useful is to work out what the pros and cons of the foundation models that are out there. My use case is for highly specialised experts, so the foundation model would need to be pretty robust.
I agree, this seemed less about fine-tuning a llama 2 base model and more like a Gradient Infomercial
You can do it offline, and without paying Gradient or any other company. All you need to have is a GPU with enough memory.
@@clray123 How? Any colab you can share for llama 2 chat models?
@@bakistas20Google bnb-4bit-training.ipynb - I would recommend changing the settings to 8-bit training, though.
We can use an external knowledge base data base with a search function that fices higher priority to most recently used
Looking forward to part 2, training a data set.
Just customize the code I already did that let me know if you want it the way I did
@@og_23yg54 bro inbox
Using a multi-billion dollar corporation's AI to create training models for a localized open source AI is about the most cyber punk thing I can think of
Hey Matt!, thanks for the video!, so I have a dumb question about this. When the model is trained, is it enabled to look in the web like chatgpt?, like, for any type of answer?, or does it have to e trained with a full set of datasets in order for it to work?
Brilliant 👍
Thank you!
Great video Matthew - question for all - what does everybody feel about training vs embeddings? This will be one of the big LLM questions - seems that training could be more cost effective than just running embeddings? But less dynamic when one wants different users of the same LLM to get different data…or maybe training is just a ‘layer’ of specific Knowledge and then embeddings works across that? Would love to see what everyone thinks.
are you referring to using embeddings for RAG? here's how we generally think about it:
- if you want the model to learn something new or get better at performing, use fine tuning
- if you want the model to have access to up to date information, or only use specific information to process a task, use RAG
@@chrischang7870 "learn something new"...use fine tuning. Absolutely NOT TRUE! "learn something new" only with "real" training.
Don't use finetuning for data or information that is transitional. A crude example; if you are online retailer, you would not finetune to add products and prices. Use RAG for that.
But it might be relevant to finetune a model to be better at speakning Dutch if you just opened up for sales in The Netherlands.
Great question. Training is better for guiding a model and embeddings are better for giving it additional knowledge.
P.s. great video @matthew_berman!
Next thing I want, is setting up two AIs to start talking to each other, setting a goal to share the most of each of AIs "know" to share "knowledge"
I would also like to see a practical example of 2 open source Llama 2 LLMs having a conversation with each other !
Would love a tutorial on how to finetune Llama 2 offline. (no apis)
google "Fine-tune Llama 2 with DPO" its a detailed huggingface guide on how to do it ;)
OK. this is helpful but: 1. how do I save this optimized model so I can use it in GPT4All app? 2. What if I don't want to use any external APIs (for privacy) and just want to do the training on my own machine or on collab but without externall APIs?
love this tutorial -- it's something i've been trying to figure out for a while now. i tried out a few other platforms and none of them are as easy as gradient
gradient makes it so much simpler to get started with my own models. really like how few lines of code it takes to get started so all i need to do is focus on my training data (which is really the important part for finetuning anyways). seems like they abstract away all of the boilerplate and infra setup, which is the main painpoint for a lot of devs like me who are just getting started.
thanks @_gwyneth!
Finally! one of the fine tuning tutorials actually works! If not for this, I was starting to think this nonsense was like sasquatch sightings. But alas! You cannot download the model to run locally. Everything has a grab to it. It's really hard to trust these platform companies with anything at all. Their security is the best, they say. Until your stuff is leaked all over the net. Then, it's an honest mistake. Honey pots, like single points of public safety failure.
This is so cool. Can we run the end result model locally?
Good video. Wondering though - what's the difference (or rather - WHEN to best use WHICH?) between 1) Fine-tuning (like on this video) vs 2) Embeddings (like using FAISS / Chroma / Pinecone) vs 3) RAG???
Do all 3 keep the data "in" when model is stopped?
Thanks for the video. Things getting simple but nobody still can't demonstrate on how to fine tune the model on unlabelled data (with no instructions / corpus of text).
Thanks Matthew - After finetuning have you found that a fine-tuned models lose all their summarization, creativity, logic, and analytical skills. I have read this but not tested myself just yet.
What are you doing to train your model on?? Let me know 😉
PS. Chris Chang is the founder of Gradient and he'll be answering your comments as well as @chrischang7870
@chrischang7870: can I use fine tuning like this to train llama2 to use tools? For example, if there’s a question that should be solved with the aid of a calculator I’d want it to produce expressions in some tool readable format, in order to replace it and potentially let it run again?
Amazing videos with great explanation!! Is there a way I can download it, I can't find anything on the website.
Thank you .. Any updates on getting the Llama work on code interpreter ?
Question, can we fine tune already fine tuned llama2 models?
Great video. Thank you for your support for the community.
I wonder if you could show how to fine tune a LLM for a web scraping job. That's not easy... Thank you.
Hmm interesting. That might be better suited for just regular programming. Maybe open interpreter?
Did you ever end up making that other video about the dataset? I cant find iti
How can I keep on training the same version? I've removed the adapter.delete() line but then for the next time how do I change the code to continously train the same version?
Only if it was not ending with 504 every time you provide more than 5 samples it would've been an amazing service
could you make that next video on advanced fine tuning ?
Thanks
does the prompt needs to match exactly the training data???
No, but it helps a lot most of the time.
It would be interesting to give these some of those old instruction tree based chatbots as a dataset. They have a huge amount of dialogue in the right format. And pleasant personalities. I enjoyed talking to them occasionally. Would be nice to have them loop less and have a bit larger knowledgebase.
Ah. Bildgesmythe. That's who he was. So amazingly written character. I may need to dig up what's the developer of that into nowadays, maybe there is a new addition to the AI. Or maybe they lost interest, though i doubt that, the character seems like a passion project.
More than that I would be interested to see how this gets done with the API solution that gradient offers.. pleease? 😊
Please also add how to import a csv file with your training data.
HI @Matthew , How can i finetuned llama 2 with my own dataset that i have to used in production? Can i follow this approach?
Any updates on fine tuning locally without being a tensor flow expert?
How can I use my fine tuned model via API? I am thinking of fine tuning the model using my own dataset and then host it on AWS as a chatbot, any guidance on this?
I am trying to download LLAM 2 but no luck
I am waiting for meta to approve my request to download LLAM 2 how much time it takes them to approve any request to download the model
Is the fine tuning by gradientai using a LoRA? (I’m still learning so I may not have used that term correctly.) Broadly, does this change the weights of all layers, only later layers, or is that something you can configure?
How realistic is it to fine tune something already fine tuned? If I’m using company data I would want to keep it up to date, but is that more something that should be done once a quarter or do other updating schedules make more sense?
you can modify the lora rank in the API actually!
you can further fine tune a model - we make that super easy so you can even do real time fine tunes of the model with small batches of data
@@chrischang7870nice
@@chrischang7870interesting… so it’s not out of the question then to add “breaking news” to it daily? So, suppose we have our fine tuned model M and then we want to add daily updates… should each day start with yesterdays cutting edge model and updates need only pertain to the last day, or should each day start with the base fine tuned model M with an aggregate update representing the last N days? That is, every day you throw out the latest cutting edge model and apply a new fine tuning using progressively more data to the same base model M; versus, every day you fine tune the latest cutting edge model with only the smallest incremental data changes? I don’t have intuition for what would keep the most coherence.
To put it in other terms, which approach would be better at answering “what was the stock price when markets closed yesterday?” and which would be better at answering “what has been the stock price trend for the last 5 days?”?
is there a gradient.load_model(model_id so i load in the model i created?
The server needs internet access for this (token) so this is not a viable option for companies with data security where their documentation server has no access to the internet.
What I dont understand is where is the documentation on the format need to use to create the datasets, why do they make it so hard to find. Not you but Meta do you have a link
Is the future video with tips and tricks out yet?
Is it possibile to upload and fine tuning files? I’d like to fine tune it with all my resources from university in order to help me studying
Uncertain whether the second part of the video has been uploaded. Could someone assist me with this?
Hi Matthew! Is it possible to use as fine tune inputs a hundred scripts of a particular programming language or is too complex to setup? (main goal: using Llama as a coder assistant for specific coding)
It is possible and the results will be crap.
Thanks, I guess it will take some more time for that.@@clray123
I might be wrong but that's a big ask.
I think practically you could throw like your company website, mission statement, and sales brochure at it and I think maybe it could answer questions a customer would ask. Like "I have problem X, would product Y help me with this?" type stuff. See that's not dramatically different than how it would normally talk, just with new information.
Now I'm just curious what you are looking for. Like something obscure like Fortran or a shader language or is it like a very strange use case that doesn't match well with LLMs that already exist? Have you looked at code llama?
One thing you could try is to have a LLM make you the code in C++ and then have it convert it for you. That way it can do the more complicated creation part in something it knows and has lots of training on a wide range of uses. Then the conversion part would just focus on implementation of what is already written.
@@georhodiumgeo9827 Hi! Basically I'm trying to have code C# written by chatgpt but it often does superstupid errors (like using functions or "fantasy" variables not existent or not even declared). Since I use it for a particular environment I was wondering if it's possible a fine tuning using not more than 100 other scripts already written and working. Probably it's just not enough and I understand the reason... Maybe in the future it will be more easy. At the present time it's really frustrating to see how some errors are repeated costantly in a really stupid way.
Possible
Mathew, how can we fine tune a model in a book instead of a question and answer format. I would like to add the knowledge of different books to llama2. Is that possible?
yep! its possible. you're just going to be pretraining the model - chunk the books into raw text strings and pass that into the gradient fine tuning api
Thanks!
Is it possible to export the model today in any way and what infra gradient uses to provide inference service?
not yet unfortunately. working on this next quarter!
Is it possible to make a model that will answer my questions based on my textbook of TYBAF accounting and finance?
Like feed my textbook to it and it will answer my questions
Thanks for the video! I was hoping to find an awesome API like this that makes fine-tuning super easy!
I got two questions if you don’t mind:
1. You have to contact them for pricing, and I don’t see any information about pricing or remaining credits or anything anywhere on the site after creating an account. Do you know where to find this? How much can I use it for free before it blocks me? (I also forgot to sign up with your coupon - whoops.)
2. I noticed you can fine-tune a model directly from the website in your workspace, and upload the training data there, without having to use Google Colab or Python as far as I can tell. Is this new, and do you recommend it over the method used in this video?
what would be the best ai for studying and learning le including case law?
Does this collab works if we have 10k data fields, on my side it gives error. BTW great work (Y)
How to create (not by hand) that question-answer format? Asking the model that is nonsense because what is the point of fine-tuning with the information the model has?
how we can save that trained model, and use it ?
Need an example how to do this a) locally and b) on Azure ML
How would we fine tune if we have 10,000 short stories in text format we want to embed?
How to use the trained model in flowise ?
Is there an open source web ui for fine tuning? Can that be done with pinokio oobabooga web ui?
how would i go about training a model on just text / document ? or excel information? or a chat from a messing app or book writing? Like mass data fine-tuning? how would i formate all that information for it to train?
you can do pretraining which just takes raw data and trains the model. you'll need to chunk the text though as the sample size limit isn't infinite.
google "Fine-tune Llama 2 with DPO" its a detailed huggingface guide on how to do it ;)
hey matt! can u give us a fine-tuning code for Llama without paid platforms please? It would be of great help for my exams in a few weeks
thanks a lot, was easy and understandable ....the finetuning is happening in gradient right via api? is not clear which gpu is used for future more sophisticated finetuning....woudl be grreat if you codul explain same process of finetuning on out own machines (for example i have 16 cpu 16 gpu) via Qlora models :) :) :) tks anyway for all
How to do it locally without a platform? Like ollama maybe
How can you download the Model?
I question the legality of this license restriction of not being allowed to train new models on llama2 data. Llama2 data is probably all over the Internet, and the output of a software is generally considered to be fair-use, especially if you alter it in some way but even if you don't.
They can say whatever they want in the ToS, it's their product and nobody is entitled to use it as they please.
What people are entitled to do is determined by case-law, and currently there is none. The output of the model isn't necessarily the property of who trained it. Simply putting a restriction in a ToS doesn't automatically legally bind everyone who comes into possession of the output of a product. If someone posts a bunch of Llama2 output, and I download it, I am not legally bound by the license. It's not even clear if the person who ran the model is bound to terms for the output of a software. I looked and I couldn't find case law for it.@@TheGargalon
@@thedoctor5478 The ToS is not about the output of the product, but about the product itself. If you take Llama and fine-tune it, it's still the same product and by their ToS you can't use it commercially. This is not specific to AI, the law is clear and they can set the ToS for how to use their product.
That isn't true. You can't set any arbitrary rules you want for how people will use the output of a software. A lot of the people running models never even agreed to the terms in the first place. That's not what I was talking about though. The term I'm referring to is the one which states that the mode's output (The text) can't be used as training data to train a new model on. @@TheGargalon
@@thedoctor5478 There is nothing arbitrary about "don't use our product to create a competing product".
When and if I can download the model Then I will go
Can I fine tune the model for SEO article writing?
Seems like I keep getting this request, maybe I need to create a video for it!
why not fully on my local machine?
Gradient is going ENTERPRISE ONLY. Are there any similar self-service alternatives?
Personal local models and swarms = future?
A good one would be compare Gradient, Lambda, and Predibase.
Which is best for which use cases.
Came here to look for a comment questioning the choice of Gradient, especially since this is sponsored.
1. So the model is stored out there in gradient?
2. Can we upload database tables ornpdf files or text files as customer knowledge base to train the model?
the model is stored on gradient, however nobody has access to the model other than you (not even us!)
you'll need to process the data first and then send it into our fine tuning api as text strings. we'll work on adding support for raw data soon!
@chrischang7870 it's kind of hard to translate a full pdf into structured strings. Let alone sql query results or even emails...so yes I'll definitely wait for those native support. Basic strings are convenient for demos, not for real-life..
@@chrischang7870yeah right nobody has access to it, only LLM fairies
Could we use this as our resume?
I wanna know how to download the data for local ollama llms
Sans-gradient variant, plz... (a.k.a. Why is there still a paid service in the open-source output loop?)
I wonder: what would the steps be, if the data you want to use for fine-tuning is a documentation, so a long text explaining stuff instead of a QA structure
you can first do pretraining to increase the models general understanding of the specific documentation. then you can add labels and instruction tune it so it knows how to leverage that information to answer document questions
@@chrischang7870Thanks for the reply!