The EASIEST way to finetune LLAMA-v2 on local machine!

Abhishek Thakur

zhlédnutí 168 779

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 19. 07. 2023
In this video, I'll show you the easiest, simplest and fastest way to fine tune llama-v2 on your local machine for a custom dataset! You can also use the tutorial to train/finetune any other Large Language Model (LLM). In this tutorial, we will be using autotrain-advanced.
AutoTrain Advanced github repo: github.com/huggingface/autotr...
Steps:
Install autotrain-advanced using pip:
- pip install autotrain-advanced
Setup (optional, required on google colab):
- autotrain setup --update-torch
Train:
autotrain llm --train --project_name my-llm --model meta-llama/Llama-2-7b-hf --data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 12 --num_train_epochs 3 --trainer sft
If you are on free version of colab, use this model instead: huggingface.co/abhishek/llama.... This is a smaller sharded version of llama-2-7b-hf by meta.
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
My book, Approaching (Almost) Any Machine Learning problem, is available for free here: bit.ly/approachingml
Follow me on:
Twitter: / abhi1thakur
LinkedIn: / abhi1thakur
Kaggle: kaggle.com/abhishek

Komentáře • 298

@linuxmanju Před 5 měsíci ⁺³²
Anyone comes across this in 2024 (jan ), the command switches with new autotrain version is autotrain llm --train --project-name josh-ops --model mistralai/Mistral-7B-Instruct-v0.2 --data-path . --use-peft --quantization int4 --lr 2e-4 --train-batch-size 12 --epochs 3 --trainer sft . Great, Video, thanks Abhishek
@BrusnickiRoberto Před 5 měsíci
After finetuning it, how to run it?
@vinodb4339 Před měsícem
@@BrusnickiRobertoHey hi did you run it??
@BrusnickiRoberto Před měsícem
@@vinodb4339 no
@tarungupta83 Před rokem ⁺⁴
That's Awesome, nothing better than this way of training large language model. Super easy ❤
@andyjax100 Před 3 měsíci
Keeping it this simple is something very few people are able to do. Very well explained.
This can be understood by even a beginner. Atleast the execution if not the intuition behind it. Kudos
@WeDuMedia Před 3 měsíci
Incredibly helpful video, I appreciate that you took the time to create this! Great stuff
@tarungupta83 Před rokem ⁺⁵
Appreciate it, and request to continue making such videos🎉
@syedshahab8471 Před rokem ⁺²
Thank you for the on-point tutorial.
@charleskarpati1129 Před 7 měsíci
Thank you Abhishek! This is phenomenal.
@MasterBrain182 Před rokem ⁺¹
Astonishing content Man 🔥🔥🔥 🚀
@AICoffeeBreak Před rokem ⁺¹¹
Amazing, tutorials at light speed! Llama 2 was just released! 😮
@abhishekkrthakur Před rokem ⁺²
🙏🏽
@AIOdysseyhub Před rokem
😂😂Yeah exactly
@weebprogrammer2979 Před 11 měsíci
This man is a genius lol
@abhishekkrthakur Před rokem ⁺²⁵
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
@arpitghatiya7214 Před 10 měsíci
Please make a video on Llama2 + RAG (instead of finetuning)
@xthefoetusx Před rokem ⁺³
Great video! Would be great if in some future vid you could go into depth on the training hyperparameters and perhaps also talk about what size your custom datasets should be.
@abhishekkrthakur Před rokem ⁺⁴
sometimes I do that. however, this model would have taken wayy too long to train. im training a model as i type here and if i get good results ill share both model and params 🙂
@emrahe468 Před 11 měsíci ⁺¹
@@abhishekkrthakur guess no good luck with the training :(
@nirsarkar Před rokem
Excellent, thank you so much. I will try.
@bryanvann Před rokem ⁺¹⁸
Thanks for the tutorial! A couple questions for you. Is there an approach you're using to test quality and verity that the training data has influenced the weights in the model sufficiently to learn the new task? And second, can you use the same approach for unstructured training data such as using a large corpus of private data to do domain adaptation?
@user-nj7ry9dl3y Před 11 měsíci ⁺¹
For fine-tuning of the large language models (llama-2-13b-chat), what should be the format(.text/.json/.csv) and structure (like should be an excel or docs file or prompt and response or instruction and output) of the training dataset? And also how to prepare or organise the tabular dataset for training purpose?
@aaronliruns Před 11 měsíci ⁺⁷
Great tutorial! Can you also put up one video teaching on how to merge the fine tuned weights to the base model and do inference? Would like to see an end-to-end course. Thank you!
@adamocheri3513 Před 10 měsíci ⁺²
+1 on this question !!!!
@devyanshrastogi Před 8 měsíci
any updates guys?? I really want to know how to merge the fine tuned model with the base model and do the inference. Do let me you have any resources or insights about the same
@kopamed5024 Před 6 měsíci
@@devyanshrastogi also need this answered. have you guys had any success?
@YuniYoshi Před 8 měsíci ⁺¹
There is only one thing I want to see. I want to see you using the final result and prove it actually works. Thank you.
@jdoejdoe6161 Před 11 měsíci ⁺¹
Hi Abh
Your method is inspiring and commendable. How do we read the csv or json training dataset we prepared instead of the hugging face dataset you used?
@mautkajuari Před rokem
Informative video, hopefully one day I will get a task that requires me to finetune a LLM
@abhishekkrthakur Před rokem ⁺¹
or you can just do it for fun 🤗
@prachijadhav9098 Před rokem ⁺²
Nice video Abhishek!
I am curious to know about custom data for LLMs. What is the ideal (good quality) data size (e.g., #rows), to fine-tune these models for good performance, not necessarily it should be big data of course.
Thanks!
@sohailhosseini2266 Před 9 měsíci
Thanks for sharing!
@JagadishSongapagounder Před rokem ⁺¹
Great Job :)
@stevenshaw124 Před rokem ⁺³
what kind of GPUs do you have? how big was your dataset and how long did it take to train? what is the smallest fine-tuning data set size that would be reasonable?
@deltagamma1442 Před rokem ⁺¹
How do you set the training data? I see different people using different formats? Does it matter or is the only requirement that it has to be structured meaniningfully?
@r34ct4 Před rokem
Thanks for the comprehensive tutorial. Can this be done using chat logs to build a clone of your friend? I have done this with GPT3.5 finetuning using prompt->response. The prompts are questions generated by ChatGPT based on the chat log message. Can the same thing be done with Instruction->Input->Response? Thank you very much man.
@ajaytaneja111 Před 11 měsíci ⁺⁴
Hi Abhishek, is the auto train using LORA or prompt tuning as the PEFT technique?
@manojreddy7618 Před rokem
Thank you for the video. I am new to this, so I am trying to set it up on my windows PC. When I am trying to install the latest version of autotrain-advanced==0.6.2, I get an error saying: trition==2.0.0.post1 cannot be found. Which I believe is only available on Linux. So is it possible to use autotrain-advanced on windows?
@safaelaqrichi9096 Před 11 měsíci
Thank you for this interesting video. How could we change the encoding to ''latin-1' in order to train on french language ? thank you.
@nehabidkar7377 Před 11 měsíci
Thanks for this great explanation. Can you provide the link to you training data?
@abramswee Před rokem
thanks for sharing!
@oliversilverstein1221 Před 11 měsíci
hello, thank you. i really need to know: does this pad appropriately? also, how does it internally split it into prompt completion? Can i make up roles like ### System? does it complete only the last message?
@boujlidamohamed Před rokem ⁺¹
First thank you for the great tutorial , I have one question : I am trying to finetune the model on Japanese , do you have any advice for that ? I have tried the same script as you did but it didn't work; it produced some gibberish after the training finished , I am guessing it is a tokenizer problem, what do you think ?
@vasuchandra Před rokem
Thanks for the tutorial.
On a Linux 5.15.0-71-generic #78-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux machine, I get following error when training llm with the small dataset. File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2819, in from_pretrained
raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
`device_map` to `from_pretrained`.
What could be the problem? Is it possible to share the data.csv that you have with single row that I can take as reference to test my own data?
@dr.mikeybee Před 8 měsíci
Nice job!
@returncode0000 Před rokem
I just bought a RTX 4090 Founders Edition. Could you tell on a particular example were I could run into limits with card when training LLMs locally? I personally think that I'm safe for the next few years and I will not run in any problems.
@utoubp Před 6 měsíci
Hi Abhishek,
Much appreciated. How would things change if we were to use simple fine tuning? That is, just a large single code file to learn from, to tune code-llama, phi2, etc..
@rohitdaddekar2900 Před rokem
Hey, could you guide us how to train custom dataset on llama2? How to prepare our dataset for training?
@jeremyarancio1683 Před rokem
Nice vid
Should we label input tokens to -100 to focus the training on the prediction?
I see no one doing it
@jaivalani4609 Před 11 měsíci
Thank you ,what is diff between instruction and input
@jessem2176 Před rokem
Great Video. i love it and can't wait to try it. Now that Llama2 is out... is it better to FineTune a model or try to create your own Model?
@mariusirgens5555 Před 11 měsíci
Superb video! Does autotrain allow to export finetuned model as GGML file? Or can it be used with GGML file?
@_Zefyr_ Před 9 měsíci ⁺¹
Hi I have a question , it´s posible to use "autotrain" without cuda, with rocm support of AMD GPU ?
@chichen8425 Před 3 měsíci
I know it could be too much but could you also make a video of how to prepare the data? I have like 'question' and 'answer' but I am strugging to make it to a trainable data set into that kind of csv so I could use it!
@sandeelg_lite Před 11 měsíci ⁺¹
I trained model using autotrain in same way as you suggested and model file is stored.
Now I need to use this model for prediction. Can you shed some light on this as well?
@deepakkrishna837 Před 8 měsíci
Hi when we tried fine tuning MPT LLM using autotrain, getting the error ValueError: MPTForCausalLM does not support gradient checkpointing. Any help you can offer on this pleas?
@spookyrays2816 Před rokem
Thank you brother
@kishalmandal5676 Před rokem
How can i load the model for inference if i stop training after 1 epoch out of 3 epochs.
@DevanshiSukhija Před 11 měsíci
How is your ipython giving suggestions? I want the same set up. Please make a video on these types of set up that assists in coding and other processes.
@cloudsystem3740 Před rokem
thank you very much
@0xeb- Před 11 měsíci
How do you deal with response in the dataset that has newline characters?
@Truizify Před rokem
Thanks for the tutorial! How would you modify the code to train on a dataset containing a single column of text? i.e. trying to perform domain-specific additional pretraining?
I would remove the peft portion to do full finetuning, anything else?
@sanjaykotabagi4407 Před rokem
Hey, Can we connect. Even I need help on similar topic. We can discuss more ...
@user-bq2vt4zz2e Před 11 měsíci
Hi, I'm looking into something similar. Did you find a good way to do this?
@kunalpatil7705 Před 10 měsíci
Thanks for the video. i have a doubt that how can i make a package of it so others can also use it offline by just installing the application
@user-we6vc9co1b Před rokem ⁺¹
Do you have to use [INST]...[/INST] for indicating the instructions? I think the original Llama 2 model was trained with these tags, so I am a bit puzzled if you have to use the tags in the csv or they are added internally ?!
@abhishekkrthakur Před rokem
in this video, im finetuning the base model. you can finetune it anyway you want. you can even take the chat model and finetune it this way. if you are using a different format for finetuning, you must use the same format while inference in order to get the best results.
@protectorate2823 Před 10 měsíci
Hello @abishekkrthakur can I train summarization models with autotrain advanced?
@as-kw8dt Před 28 dny
If there are a multiple input values how that have to be inserted in the cvs data ?
@ConsultingjoeOnline Před 4 měsíci
How do you convert it to work with Ollama? I setup the model file and it doesnt seem to know anything from my training.
@eltoro2339 Před rokem
I added push_to_hub command but it didnt push.. how do I use it to test the output?
@mallorywestwood Před 11 měsíci
Can we do this on a CPU? I am using a GGmL model.. please share your thoughts
@tal7atal7a66 Před 4 měsíci
thanks bro ❤
@aakritisrivastava4789 Před 11 měsíci
I am trying to use the generated model using autotrain from_pretrained ,, but its giving me error does not appear to have a file named config.json. Does anyone have the code for predicting or help me with this issue
@sd_1989 Před rokem
Thanks!
@elmuchoconrado Před 11 měsíci ⁺⁷
As always very useful and short without wasting anyone's time. Thank you. Just I'm a bit confused about the prompt formatting you have used here - "### Instruction:
### Input:... etc" while Llama official is "[INST] {{ system_prompt }}{{ user_message }} [/INST]" and on TheBloke's page it says "SYSTEM: {system_prompt}
USER: {prompt}
ASSISTANT:"
@ahmetekizx Před 8 měsíci
I think this isn't mandatory, it is a suggestion.
@ajaypranav1390 Před 6 měsíci
Thanks for this great video, but how to fine tune or train for question answer data set
@yashvardhanjain1968 Před rokem
Thanks! Is there a way to push the trained model to hub after its trained and not using the --push_to_hub while training? Also, when I try to use push to hub, I get a "you don't have rights to create a model under this namespace". I am using a read token to access the llama model. Do I need to change it to a write token? Is it possible to use two separate tokens? (sorry, I'm super new to Huggingface) Any help is much appreciated. Thanks!
@abhishekkrthakur Před rokem ⁺¹
yes. you need to use a write token. you can remove push to hub and then push the model manually using git commands if you wish
@srinivasanm48 Před 3 měsíci
When will I be able to see the model that I have trained? Once all the training is complete?
@agostonhuszka8237 Před rokem
Thank for the tutorial!
How can I fine-tune the language model with a domain-specific unlabeled dataset to improve performance on that specific domain? Is it effective to leave the instruction and input empty and only use domain-specific text for the output?
@sanjaykotabagi4407 Před rokem
Hey, Can we connect. Even I need help on similar topic. We can discuss more ...
@abdellaziztekaya8596 Před 6 měsíci
Where can i find to code you worte and your dataset? I would like to use it as an exemple for testing
@marioricoibanez144 Před rokem
Hey! Fantastic video, but i do not understand at all the division into smaller chunks of the model in order to work in free version of collab, can you explain it? Thank you!
@abhishekkrthakur Před rokem
chunks are loaded into ram first. since larger chunks didnt fit in ram with all the other stuff, i created a version with smaller shards :)
@FlyXing16 Před 11 měsíci
Thanks Kaggle grand master :) you got an channel.
@manishsharma2211 Před rokem
The way Abhishek side eyes before stopping the video and resuming is is soo crazy 🤣🤣😅
@abhishekkrthakur Před rokem ⁺²
lol. big screen. button too far 🤣
@0xeb- Před 11 měsíci
How to shard as you mentioned towards the end?
@dhruvilshah7770 Před 4 měsíci ⁺¹
Can you make a video for fine tuning in silicon macs ?
@nirsarkar Před 11 měsíci
Can this be done on Apple Silicon, I have M2 with 24G memory?
@cesarsantosvisballambis5469 Před 10 měsíci
Hi , nice tutorial, could you please help me with this error ? , when I try to train the model I got this error : raise ValueError("No GPU found. Please install CUDA and try again.") , Do you know how to solve this ?
@shaileshtiwari8483 Před 10 měsíci
Is Gpu Machine necessary for llama 7b to be trained?
@eunoia7151 Před 11 měsíci
How do I use a dataset in the huggingface hub?
@takuyayzu Před rokem
Hi, I've tried running auto trainer with the sharded model on a dataset I created & uploaded on HF. However, when running the auto trainer I quickly get the following error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 14.75 GiB total capacity; 10.15 GiB already allocated; 1.40 GiB free; 12.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I'm using the free tier of Google Colab and I know you mentioned that it should be working in it. Do you have any idea what might cause this and what can be done to solve this issue?
@takuyayzu Před rokem ⁺¹
It seems changing the value of "train_batch_size" helped solve the issue. I've changed it from 12 to 4, as I've seen other examples/guides use it as well. Will try with 4 then and then maybe with higher values (6, 8, etc).
@ashishtater3363 Před 3 měsíci
I have llm downloaded can I fine tune it with downloading from huggingface.
@bhaveshbadjatya2914 Před 11 měsíci
When tying to use inference API for finetuned model I am getting 'error': "Could not load model XXXX/XXXX with any of the following classes: (,) How to resolve this ?
@Sehyo Před 11 měsíci
How can I turn this into a gptq version after finetuning?
@sebastianandrescajasordone8501 Před rokem
I am running out of memory when testing it on the free-version of google colab, did you use the exact same tuning parameters as described in the video?
@abhishekkrthakur Před rokem
yes. you can reduce batch size. note, you need to use different model path if you are on colab or it will run out of memory. see description for more details
@am0x01 Před 6 měsíci ⁺¹
In my experiment, it not create the [config.json] what am I doing wrong?
@aurkom Před 11 měsíci
How to change this for tasks like classification?
@saitej4808 Před rokem
How to fine-tune with text corpus data? Ex: if I pass latest news how model can understand/ memorise all able answer context based questions on facts
@crimsonalchemist856 Před rokem ⁺¹
Hey Abhishek, Thanks for sharing this amazing tutorial. Can I do this on my RTX 3070Ti 8GB GPU? If yes, what batch size would be preferable?
@abhishekkrthakur Před rokem ⁺²
8GB sounds a bit low for this. maybe try bs=1 or 2? but tbh, im not sure if it will work. Might work fine for a smaller model!
@jas5945 Před 11 měsíci ⁺¹
Very good tutorial. On what machine are you running this? I am trying to run it on a Macbook pro M1 but I keep getting "ValueError: No GPU found. Please install CUDA and try again." I have tried to do this directly on Huggingface and got "error 400: bad request"...so I cloned autotrain and ran it locally...still getting error 400. Do you have any pointers?
@nirsarkar Před 10 měsíci
Same error
@DavidJones-cw1ip Před 10 měsíci ⁺¹
Any chance you have the python scripts available somewhere? Thanks in advance.
@user-oh6ve3df7l Před rokem ⁺¹
Amazing content. One Q left: how can I run the model locally in inference mode after training? Anyone have a command for that?
@abhishekkrthakur Před rokem ⁺¹
czcams.com/video/o1BCq1KJULM/video.html
@EduardoRodriguez-fu4ry Před rokem
Great tutorial! Thank you! Maybe I missed it but, at which point do you enter your HF token?
@abhishekkrthakur Před rokem ⁺¹
You dont. You login using "huggingface-cli login" command. There's also a similar command for notebooks and colab. :)
@unclecode Před rokem ⁺¹
Beautiful content, I have a side question, what tool you are using to have "copilot"-like suggestion in your terminal? Thx again for the video
@jessem2176 Před rokem
I use Hugginfaces co pilot. - it works pretty well and super easy to set up and free..
@ahmetekizx Před 8 měsíci
@@jessem2176 Thanks for the recommendation, but did you mean HuggingFace Personal-copilot Blog?
@abhisekpanigrahi1033 Před 11 měsíci
How can we create sharded version of Llama 2
@anantkabra6825 Před 8 měsíci ⁺¹
Hello I am getting this error can someone please help me out with it: ValueError: Batch does not contain any data (`None`). At the end of all iterable data available before expected stop iteration.
@SorinBuda Před rokem
Any idea why Autotrain says `llm` is not available, only app?
AutoTrain advanced CLI: error: invalid choice: 'llm' (choose from 'app')
@abhishekkrthakur Před rokem
please update to latest version
@prathampundir5924 Před 2 měsíci
can i train llama3 also with these steps?
@rajhammeersinghhada72 Před 6 měsíci
Why do we need --mixed-precsion and --quantization both? Aren't they both doing the same thing?
@oxydol3456 Před 2 měsíci
which machine is recommended for fine-tuning LLAMA? windows?
@jaivalani4609 Před 11 měsíci
How do we evaluate model any auto.api ?
@ShotterManable Před rokem
Is there a way to run it on CPU? Thanks sir, I love your work
@sachinsoni5044 Před rokem
hey Abhishek, I am a full stack developer and interested in AI. I love to code. I tried learning DS but found no interest in juggling with data. How should i learn?
@manabchetia8382 Před rokem
Thank you. Can you please also show us how to train on GPU #3 or GPU#1 or both GPU#1&3 but not in GPU #0 in a multi GPU machine?
@abhishekkrthakur Před rokem ⁺⁴
CUDA_VISIBLE_DEVICES=0 autotrain llm --train ..... will run it on gpu 0
CUDA_VISIBLE_DEVICES=1,3 autotrain llm --train ..... will run it on gpu 1 and 3
@AkK-iq3bz Před 2 měsíci
how do you get access to the llama model?

Další v pořadí

Automatické přehrávání

All You Need To Know About Running LLMs Locally