U r awesome 👍👏😊 nitish, I have recently joined ur channel... Bahut kuch acha or naya seekhne ko mil rha.. Wo bi hindi mai... Seedhe dimag mai chipak jata... Thanks to be the part of my data journey...love you 3000 ❤❤❤
Suppose we have an input X with n components and a linear neuron with random weights Wthat spits out an output Y. The variance of y can be written as: Y=W1X1+W2X2+⋯+WnXn We know that the variance of WiXi is Var(WiXi)=E(Xi)2Var(Wi)+E(Wi)2Var(Xi)+Var(Wi)Var(Xi) Here, we assume that Xi and Wi are all identically and independently distributed (Gaussian distribution with zero mean), we can work out the variance of Y which is: Var(Y)=Var(W1X1+W2X2+⋯+WnXn)=Var(W1X1)+Var(W2X2)+⋯+Var(WnXn)=nVar(Wi)Var(Xi) The variance of the output is the variance of the input but it is scaled by nVar(Wi). Hence, if we want the variance of Y to be equal to the variance of X, then the term nVar(Wi) should be equal to 1. Hence, the variance of the weight should be: Var(Wi)=1/n(input) This is Xavier Initialization formula. We need to pick the weights from a Gaussian distribution with zero mean and a variance of 1n where n is the number of input neurons in the weight tensor.. That is how Xavier (Glorot) initialization is implemented in Caffee library. Similarly, if we go through backpropagation, we apply the same steps and get: Var(Wi)=1/n(out) In order to keep the variance of the input and the output gradient the same, these two constraints can only be satisfied simultaneously if n(input)=n(output). However, in the general case, the n(input) and n(output) of a layer may not be equal, and so as a sort of compromise, Glorot and Bengio suggest using the average of the n(input) and n(output), proposing that: Var(Wi)=1/n(avg) where n(avg)=(n(input)+n(out))/2. So, the idea is to initialize weights from Gaussian Distribution with mean = 0.0 and variance: σ=√(2/(n(input)+n(output)) Note that when the number of input connections is roughly equal to the number of output connections, you get the simpler equations: σ^2=1/n(input)
Hey, will you be covering Momentum optimization, Nesterov accelerated gradient, adaGrad, RmsProp, adam and nadam optimization in your future videos in deep learning playlist?
how to join your full course because in 100 day python learning there is no 100 videos ...i want to there is any site or u only upload video in u tube not other site?...i want to do proper course?
How do you upload your data to raspberry pi or arduino. Do i need to buy heavy graphics card laptop? Or I can simply go with dell inspiron 14 5514 or hp Pavillion aero 13.
Your deep learning playlist is helping me so much,i have my AI/ML paper after 3 days 😭..your videos are helping me a lot
which year?
@@hritikroshanmishra3630 which clg?
U r awesome 👍👏😊 nitish, I have recently joined ur channel... Bahut kuch acha or naya seekhne ko mil rha.. Wo bi hindi mai... Seedhe dimag mai chipak jata... Thanks to be the part of my data journey...love you 3000 ❤❤❤
Suppose we have an input X with n components and a linear neuron with random weights Wthat spits out an output Y.
The variance of y can be written as:
Y=W1X1+W2X2+⋯+WnXn
We know that the variance of WiXi is
Var(WiXi)=E(Xi)2Var(Wi)+E(Wi)2Var(Xi)+Var(Wi)Var(Xi)
Here, we assume that Xi and Wi are all identically and independently distributed (Gaussian distribution with zero mean), we can work out the variance of Y which is:
Var(Y)=Var(W1X1+W2X2+⋯+WnXn)=Var(W1X1)+Var(W2X2)+⋯+Var(WnXn)=nVar(Wi)Var(Xi)
The variance of the output is the variance of the input but it is scaled by nVar(Wi). Hence, if we want the variance of Y to be equal to the variance of X, then the term nVar(Wi) should be equal to 1. Hence, the variance of the weight should be:
Var(Wi)=1/n(input)
This is Xavier Initialization formula. We need to pick the weights from a Gaussian distribution with zero mean and a variance of 1n where n is the number of input neurons in the weight tensor.. That is how Xavier (Glorot) initialization is implemented in Caffee library.
Similarly, if we go through backpropagation, we apply the same steps and get:
Var(Wi)=1/n(out)
In order to keep the variance of the input and the output gradient the same, these two constraints can only be satisfied simultaneously if n(input)=n(output). However, in the general case, the n(input) and n(output) of a layer may not be equal, and so as a sort of compromise, Glorot and Bengio suggest using the average of the n(input) and n(output), proposing that:
Var(Wi)=1/n(avg)
where n(avg)=(n(input)+n(out))/2.
So, the idea is to initialize weights from Gaussian Distribution with mean = 0.0 and variance:
σ=√(2/(n(input)+n(output))
Note that when the number of input connections is roughly equal to the number of output connections, you get the simpler equations:
σ^2=1/n(input)
This playlist is amazing..........
Very informative
Thanks for your efforts.
I'm at that point in my academics where i feel lost without you XD
sir keep uploading big big big fan of your teaching ,full support to you sir...your videos helping me a lot ...god bless you
Your deep learning playlist is helping me so much ,please sir upload CNN lecture as soon as possible
wish you and your channel keep growing!
Very good knowledge getting from this chennel.... Jay Swaminarayan
Shahzada, Bubber Shair veer....😍
God bless you sir🙏you’re helping so many lives💯
Sir please don't stop to make videos
Your videos are actually help us🙏
Thank You Sir.
Outstanding effort for everyone especially for me like new students. Plz tell me about kiras????wts date???
Thanks
Hey, will you be covering Momentum optimization, Nesterov accelerated gradient, adaGrad, RmsProp, adam and nadam optimization in your future videos in deep learning playlist?
Yes
@@campusx-official please continue your dl series 🥺🥺🥺
Thanku so much sir
from Pakistan
God level teacher ❤️🤌🏻
best
5:00 who remembers XAVIER BHAIYA ?😂😂
how to join your full course because in 100 day python learning there is no 100 videos ...i want to there is any site or u only upload video in u tube not other site?...i want to do proper course?
It feels like have a runny nose. Take care brother.
How do you upload your data to raspberry pi or arduino. Do i need to buy heavy graphics card laptop? Or I can simply go with dell inspiron 14 5514 or hp Pavillion aero 13.
Superb
Bhaiya projects pe video laao na pls 🙏🏻
hello brother , have you got any job?
Sir Plz update more videos
🙏🙏🙏🙏🙏🙏
Sir I am not getting better or equal results performing initialization setting the weights manually and setting them using kernel initializer