Coding Llama 3 from scratch in PyTorch - Part 1
VloĆŸit
- Äas pĆidĂĄn 5. 05. 2024
- In this video series, you will learn how to train and fine-tune Llama 3 model from scratch.
The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45BM params. In this first video, you'll learn about upcycling, downcycling and infini-attention.
đPapers:
- Sparse Upcycling Training Mixture-of-Experts from Dense Checkpoints
: arxiv.org/abs/2212.05055
- Pre-training Small Base LMs with Fewer Tokens: arxiv.org/abs/2404.08634
Leave No Context Behind Efficient Infinite Context Transformers with Infini-attention: arxiv.org/abs/2404.07143
đ» To follow along you can use this colab notebook:
- github.com/Blaizzy/Coding-LLM...
đ„ Coding Llama 2 from scratch video series
Part 1: czcams.com/users/liveXHmag4damTg
Part 2: czcams.com/users/liveLSWDpFmbE90
Part 3: âą Coding Llama 2 from sc... - VÄda a technologie
This is very thoughtful and great initiative! researchers with enough gray matter but limited means can be still in the game . Thank you PCđ!
Most welcome!
Itâs my pleasure:)
I lived through this so others donât have to.
this is very impressive and great content. thank you
You're very welcome!
Super impressive. Great value
One question
How do I further train the model on my custom content
Instead of LORA ?
Can we further full training it and add new memory
Most welcome!
You can do that, but that can be very expensive.
Bro how did you train llama 3 without paper?
Could you elaborate?