Layer Normalization in Transformers | Layer Norm Vs Batch Norm

CampusX

zhlédnutí 7 242

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 27. 06. 2024
Layer Normalization is a technique used to stabilize and accelerate the training of transformers by normalizing the inputs across the features. It adjusts and scales the activations, ensuring consistent output distributions. This helps in reducing training time and improving model performance, making it a key component in transformer architectures.
Share your thoughts, experiences, or questions in the comments below. I love hearing from you!
============================
Did you like my teaching style?
Check my affordable mentorship program at : learnwith.campusx.in
DSMP FAQ: docs.google.com/document/d/1O...
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
E-mail us at support@campusx.in
✨ Hashtags✨
#deeplearning #campusx #transformers #transformerarchitechture
⌚Time Stamps⌚
00:00 - Intro
02:20 - What is Normalization
03:50 - What do we normalize?
05:30 - Benefits of Normalization in DL
07:10 - Internal Covariate Shift
12:49 - Batch Normalization Revision
22:56 - Why don't we use Batch Norm in Transformers?
38:25 - How does Layer Normalization works?
43:00 - Layer Normalization in Transformer

Komentáře • 85

@abhisheksaurav Před 21 dnem ⁺³¹
This playlist is like a time machine. I’ve watched you grow your hair from black to white, and I’ve seen the content quality continuously improve video by video. Great work!
@animatrix1631 Před 21 dnem
I feel the same as well but I guess he's not that old
@RamandeepSingh_04 Před 5 dny ⁺³
Another student added in the waiting list demanding for next video. Thank you sir.
@muhammadsheraz177 Před 21 dnem ⁺¹⁵
Please end this playlist as early as possible
@ayushrathore2570 Před 14 dny ⁺¹
This whole playlist is the best thing I discovered on CZcams! Thank you so much, sir
@yashshekhar538 Před 14 dny ⁺⁴
Respected Sir,
your playlist is the best. Kindly increase the frequency of videos.
@ESHAANMISHRA-pr7dh Před 2 dny
Respected sir,
I request you to please complete the playlist. I am really thankful to you for your amazing videos in this playlist. I have recommended this playlist to a lot of my friends and they loved it too. Thanks for providing such content for free🙏🙏
@akeshagarwal794 Před 18 dny ⁺²
Congratulations for building a 200k Family you deserve even more reach🎉❤
We love you sir ❤
@GanitSikho-xo2yx Před 2 dny
Well, I am waiting for your next video. It's a gem of learning!
@shreeyagupta5720 Před 21 dnem ⁺²
Congratulations for 200k sir 👏 🎉🍺
@rajnishadhikari9280 Před 21 dnem
Thanks for this amazing series.
@AmitBiswas-hd3js Před 3 dny
Please cover this entire Transformer architecture as soon as possible
@sahil5124 Před 20 dny ⁺¹
this is really important topic. Thank you so much.
Please cover everything about Transformer architecture
@user-nc8nc3lj1c Před 13 dny
Sir try to complete this playlist as early as possible , you are the best teacher and we want to learn the deep learning concept from you
@ai_pie1000 Před 19 dny ⁺¹
Congratulations Brother for 200k users Family ... 👏👏👏
@rb4754 Před 20 dny
Congratulations for 200k subscribers!!!!!!!!!!!!!!!!!!
@arpitpathak7276 Před 21 dnem
Thank you sir I am waiting for this video ❤
@mayyutyagi Před 19 dny
Amazing series full of knowledge...
@znyd. Před 19 dny
Congrats on the 200k subs, love from Bangladesh ❤.
@advaitdanade7538 Před 21 dnem ⁺³
Sir please end this playlist fast placement season is nearby😢
@hassan_sid Před 21 dnem
It would be great if you make a video on RoPE
@dharmendra_397 Před 21 dnem
Very nice video
@shibrajdeb5177 Před 14 dny
sir please upload regular video . This videos help me a lot. please sir upload regular videos
@1111Shahad Před 13 dny
Thank you Nitish, Waiting for your next upload.
@muhammadsheraz177 Před 21 dnem ⁺¹
Sir kindly can you tell that when this playlist will complete.
@physicskiduniya8054 Před 4 dny
Bhaiya! Awaiting for your course upcoming videos please try to complete this playlist asap bhaiya
@taseer12 Před 21 dnem ⁺¹
Sir I can't describe your efforts Love from Pakistan
@SulemanZeb. Před 21 dnem ⁺¹
Please start MLOPs playlist as we are desperately waiting for.......
@rose9466 Před 21 dnem
Can you give an estimate by when this playlist will be completed
@WIN_1306 Před 3 dny
i am the 300th person to like this video
sir plzz upload next vidoes
we are eagerly waiting
@saurabhbadole821 Před 15 dny
I am glad that I found this Channel! can't thank you enough, Nitish Sir!
One more request: If you could create one-shot revision videos for machine learning, deep learning, and natural language processing (NLP).🤌
@gurvgupta5515 Před 16 dny
Thanks for this video sir. Can you also make a video on Rotary Positional Embeddings (RoPE) that is used in Llama as well as other LLMs for enhanced attention.
@29_chothaniharsh62 Před 21 dnem
Sir can you please continue the 100 interview questions on ML playlist?
@shubharuidas2624 Před 12 dny
Please also continue with vision transformer
@not_amanullah Před 12 dny
Thanks ❤
@not_amanullah Před 12 dny
This is helpful 🖤
@intcvn Před 4 dny
complete jaldi sir waiting asf
@sagarbhagwani7193 Před 20 dny
thanks sir plse complete this playlist asap
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@MrSat001 Před 21 dnem
Great 👍
@teksinghayer5469 Před 21 dnem
when will you code transformer from scratch in pytorch
@Amanullah-wy3ur Před 3 dny
thanks ❤
@technicalhouse9820 Před 20 dny
Sir love you so much from Pakistan
@virajkaralay8844 Před 20 dny
Absolute banger video again. Appreciate the efforts you're taking for transformers. Cannot wait for when you explain the entire transformer architecture.
@virajkaralay8844 Před 20 dny
Also, congratulations for 200k subscribers. May you reach many more milestones
@ishika7585 Před 21 dnem ⁺¹
Kindly make video on Regex as well
@WIN_1306 Před 3 dny
what is regex?
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost.
@WIN_1306 Před 3 dny
at 46:10 ,why it is zero?
as beta is added so it will prevent it from becoming zero?
@manojprasad6781 Před 16 dny
Waiting for the next video💌
@barryallen5243 Před 16 dny
Just ignoring padded rows while performing batch normalization should also work, I feel like it that padded zeros are not the only reason we layer normalization instead of batch normalization.
@WIN_1306 Před 3 dny
how would you ignore padding cols in batch normalisation?
@peace-it4rg Před 6 dny ⁺¹
sir mera doubt that ki mai agar transformer architecture mai batchnorm use karoon kunki jo values matrix mai hai un sabka apna learning rate and bias factor hai
to jo bias hai uskai karan to zero chala hi jayega fir layer norm kyun. kyunki ham ((x-u)/var)*lambda+bias krtai hi hain to bias to apne aap usko zero nhi hone dega. Please help sir
@RamandeepSingh_04 Před 5 dny
still it will be a very small number and will affect the result and not represent the true picture of the feature in batch normalization.
@WIN_1306 Před 3 dny
@@RamandeepSingh_04 compared to others who are without padding it will be small, but still sir wrote zero
but zero to nhi hi hoga
@space_ace7710 Před 21 dnem
Yeah!!
@aksholic2797 Před 21 dnem
200k🎉
@SANJAYTYAGI-bk6tx Před 12 dny
Sir
In batch normalization , in your example we have three mean and three variance along with same number of beta and gamma i.e. 3.
But in layer normalization , we have eight mean and eight variance along with 3 beta and 3 gamma.
That means number of beta and gamma are same in both batch and layer normalization.
Is it correct? Pl elaborate on it .
@campusx-official Před 12 dny
Yes
@WIN_1306 Před 3 dny
mean and variance are used for normalisation ,beta and gamma are used for scaling
@vikassengupta8427 Před 4 dny
Sir next video ❤❤
@user-mw9ny7wc6l Před 12 dny
Jldi next video dalo sir
@adarshsagar9817 Před 15 dny
sir please complete the NLP playlist
@WIN_1306 Před 3 dny
which one?
how many videos does it have?
@titaniumgopal Před 5 dny
Sir PDF Update karo
@gauravbhasin2625 Před 20 dny
Nitish, please relook at your covariate shift funds... yes, you are partially correct but how you explained covariate shift is actually incorrect. (example - Imagine training a model to predict if someone will buy a house based on features like income and credit score. If the model is trained on data from a specific city with a certain average income level, it might not perform well when used in a different city with a much higher average income. The distribution of "income" (covariate) has shifted, and the model's understanding of its relationship to house buying needs to be adjusted.)
@WIN_1306 Před 3 dny
ig , the explanation that sir gave and your explanation are same with different example of covariate shift
@DarkShadow00972 Před 21 dnem
Bring some coding example bro
@ghousepasha4172 Před 10 dny
Sir please complete playlist I will pay 5000 for that
@bmp-zz9pu Před 19 dny
A video after 2 weeks in this playlist.....itna zulam mat karo.....thoda tez kaam kro sirji..............
@ashutoshpatidar3288 Před 16 dny
please be a little fast!
@Amanullah-wy3ur Před 3 dny
this is helpful 🖤
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
@anonymousman3014 Před 9 dny
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.

Další v pořadí

Automatické přehrávání

Introduction to Transformers | Transformers Part 1