Layer Normalization in Transformers | Layer Norm Vs Batch Norm
Vložit
- čas přidán 27. 06. 2024
- Layer Normalization is a technique used to stabilize and accelerate the training of transformers by normalizing the inputs across the features. It adjusts and scales the activations, ensuring consistent output distributions. This helps in reducing training time and improving model performance, making it a key component in transformer architectures.
Share your thoughts, experiences, or questions in the comments below. I love hearing from you!
============================
Did you like my teaching style?
Check my affordable mentorship program at : learnwith.campusx.in
DSMP FAQ: docs.google.com/document/d/1O...
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
E-mail us at support@campusx.in
✨ Hashtags✨
#deeplearning #campusx #transformers #transformerarchitechture
⌚Time Stamps⌚
00:00 - Intro
02:20 - What is Normalization
03:50 - What do we normalize?
05:30 - Benefits of Normalization in DL
07:10 - Internal Covariate Shift
12:49 - Batch Normalization Revision
22:56 - Why don't we use Batch Norm in Transformers?
38:25 - How does Layer Normalization works?
43:00 - Layer Normalization in Transformer
This playlist is like a time machine. I’ve watched you grow your hair from black to white, and I’ve seen the content quality continuously improve video by video. Great work!
I feel the same as well but I guess he's not that old
Another student added in the waiting list demanding for next video. Thank you sir.
Please end this playlist as early as possible
This whole playlist is the best thing I discovered on CZcams! Thank you so much, sir
Respected Sir,
your playlist is the best. Kindly increase the frequency of videos.
Respected sir,
I request you to please complete the playlist. I am really thankful to you for your amazing videos in this playlist. I have recommended this playlist to a lot of my friends and they loved it too. Thanks for providing such content for free🙏🙏
Congratulations for building a 200k Family you deserve even more reach🎉❤
We love you sir ❤
Well, I am waiting for your next video. It's a gem of learning!
Congratulations for 200k sir 👏 🎉🍺
Thanks for this amazing series.
Please cover this entire Transformer architecture as soon as possible
this is really important topic. Thank you so much.
Please cover everything about Transformer architecture
Sir try to complete this playlist as early as possible , you are the best teacher and we want to learn the deep learning concept from you
Congratulations Brother for 200k users Family ... 👏👏👏
Congratulations for 200k subscribers!!!!!!!!!!!!!!!!!!
Thank you sir I am waiting for this video ❤
Amazing series full of knowledge...
Congrats on the 200k subs, love from Bangladesh ❤.
Sir please end this playlist fast placement season is nearby😢
It would be great if you make a video on RoPE
Very nice video
sir please upload regular video . This videos help me a lot. please sir upload regular videos
Thank you Nitish, Waiting for your next upload.
Sir kindly can you tell that when this playlist will complete.
Bhaiya! Awaiting for your course upcoming videos please try to complete this playlist asap bhaiya
Sir I can't describe your efforts Love from Pakistan
Please start MLOPs playlist as we are desperately waiting for.......
Can you give an estimate by when this playlist will be completed
i am the 300th person to like this video
sir plzz upload next vidoes
we are eagerly waiting
I am glad that I found this Channel! can't thank you enough, Nitish Sir!
One more request: If you could create one-shot revision videos for machine learning, deep learning, and natural language processing (NLP).🤌
Thanks for this video sir. Can you also make a video on Rotary Positional Embeddings (RoPE) that is used in Llama as well as other LLMs for enhanced attention.
Sir can you please continue the 100 interview questions on ML playlist?
Please also continue with vision transformer
Thanks ❤
This is helpful 🖤
complete jaldi sir waiting asf
thanks sir plse complete this playlist asap
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Great 👍
when will you code transformer from scratch in pytorch
thanks ❤
Sir love you so much from Pakistan
Absolute banger video again. Appreciate the efforts you're taking for transformers. Cannot wait for when you explain the entire transformer architecture.
Also, congratulations for 200k subscribers. May you reach many more milestones
Kindly make video on Regex as well
what is regex?
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost.
at 46:10 ,why it is zero?
as beta is added so it will prevent it from becoming zero?
Waiting for the next video💌
Just ignoring padded rows while performing batch normalization should also work, I feel like it that padded zeros are not the only reason we layer normalization instead of batch normalization.
how would you ignore padding cols in batch normalisation?
sir mera doubt that ki mai agar transformer architecture mai batchnorm use karoon kunki jo values matrix mai hai un sabka apna learning rate and bias factor hai
to jo bias hai uskai karan to zero chala hi jayega fir layer norm kyun. kyunki ham ((x-u)/var)*lambda+bias krtai hi hain to bias to apne aap usko zero nhi hone dega. Please help sir
still it will be a very small number and will affect the result and not represent the true picture of the feature in batch normalization.
@@RamandeepSingh_04 compared to others who are without padding it will be small, but still sir wrote zero
but zero to nhi hi hoga
Yeah!!
200k🎉
Sir
In batch normalization , in your example we have three mean and three variance along with same number of beta and gamma i.e. 3.
But in layer normalization , we have eight mean and eight variance along with 3 beta and 3 gamma.
That means number of beta and gamma are same in both batch and layer normalization.
Is it correct? Pl elaborate on it .
Yes
mean and variance are used for normalisation ,beta and gamma are used for scaling
Sir next video ❤❤
Jldi next video dalo sir
sir please complete the NLP playlist
which one?
how many videos does it have?
Sir PDF Update karo
Nitish, please relook at your covariate shift funds... yes, you are partially correct but how you explained covariate shift is actually incorrect. (example - Imagine training a model to predict if someone will buy a house based on features like income and credit score. If the model is trained on data from a specific city with a certain average income level, it might not perform well when used in a different city with a much higher average income. The distribution of "income" (covariate) has shifted, and the model's understanding of its relationship to house buying needs to be adjusted.)
ig , the explanation that sir gave and your explanation are same with different example of covariate shift
Bring some coding example bro
Sir please complete playlist I will pay 5000 for that
A video after 2 weeks in this playlist.....itna zulam mat karo.....thoda tez kaam kro sirji..............
please be a little fast!
this is helpful 🖤
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.
Sir, is transformer architecture completed as I want to cover it ASAP, I have covered the topics till attention mechanism.
I want to cover the topic in one go. Sir please tell please. And, sir I request to upload all video asap. I want to learn a lot. And thanks for the amazing course at 0 cost. God bless you.