Xavier/Glorat And He Weight Initialization in Deep Learning

Sdílet
Vložit
  • čas přidán 6. 09. 2024

Komentáře • 35

  • @xijinping3267
    @xijinping3267 Před 2 lety +15

    Your deep learning playlist is helping me so much,i have my AI/ML paper after 3 days 😭..your videos are helping me a lot

  • @ajityadav-db7qq
    @ajityadav-db7qq Před 2 lety +10

    U r awesome 👍👏😊 nitish, I have recently joined ur channel... Bahut kuch acha or naya seekhne ko mil rha.. Wo bi hindi mai... Seedhe dimag mai chipak jata... Thanks to be the part of my data journey...love you 3000 ❤❤❤

  • @SarangBanakhede
    @SarangBanakhede Před 11 měsíci +1

    Suppose we have an input X with n components and a linear neuron with random weights Wthat spits out an output Y.
    The variance of y can be written as:
    Y=W1X1+W2X2+⋯+WnXn
    We know that the variance of WiXi is
    Var(WiXi)=E(Xi)2Var(Wi)+E(Wi)2Var(Xi)+Var(Wi)Var(Xi)
    Here, we assume that Xi and Wi are all identically and independently distributed (Gaussian distribution with zero mean), we can work out the variance of Y which is:
    Var(Y)=Var(W1X1+W2X2+⋯+WnXn)=Var(W1X1)+Var(W2X2)+⋯+Var(WnXn)=nVar(Wi)Var(Xi)
    The variance of the output is the variance of the input but it is scaled by nVar(Wi). Hence, if we want the variance of Y to be equal to the variance of X, then the term nVar(Wi) should be equal to 1. Hence, the variance of the weight should be:
    Var(Wi)=1/n(input)
    This is Xavier Initialization formula. We need to pick the weights from a Gaussian distribution with zero mean and a variance of 1n where n is the number of input neurons in the weight tensor.. That is how Xavier (Glorot) initialization is implemented in Caffee library.
    Similarly, if we go through backpropagation, we apply the same steps and get:
    Var(Wi)=1/n(out)
    In order to keep the variance of the input and the output gradient the same, these two constraints can only be satisfied simultaneously if n(input)=n(output). However, in the general case, the n(input) and n(output) of a layer may not be equal, and so as a sort of compromise, Glorot and Bengio suggest using the average of the n(input) and n(output), proposing that:
    Var(Wi)=1/n(avg)
    where n(avg)=(n(input)+n(out))/2.
    So, the idea is to initialize weights from Gaussian Distribution with mean = 0.0 and variance:
    σ=√(2/(n(input)+n(output))
    Note that when the number of input connections is roughly equal to the number of output connections, you get the simpler equations:
    σ^2=1/n(input)

  • @rb4754
    @rb4754 Před 3 měsíci +1

    This playlist is amazing..........

  • @narendraparmar1631
    @narendraparmar1631 Před 6 měsíci +2

    Very informative
    Thanks for your efforts.

  • @huzefaghadiyali5886
    @huzefaghadiyali5886 Před 2 lety +8

    I'm at that point in my academics where i feel lost without you XD

  • @shubhamhundet9680
    @shubhamhundet9680 Před 2 lety +4

    sir keep uploading big big big fan of your teaching ,full support to you sir...your videos helping me a lot ...god bless you

  • @avishinde2929
    @avishinde2929 Před 2 lety +6

    Your deep learning playlist is helping me so much ,please sir upload CNN lecture as soon as possible

  • @paragbharadia2895
    @paragbharadia2895 Před 17 dny

    wish you and your channel keep growing!

  • @piyushsavani2277
    @piyushsavani2277 Před rokem +1

    Very good knowledge getting from this chennel.... Jay Swaminarayan

  • @mohammedamirjaved8418
    @mohammedamirjaved8418 Před 2 lety +4

    Shahzada, Bubber Shair veer....😍

  • @VarunMalik-mo6mr
    @VarunMalik-mo6mr Před 6 měsíci +1

    God bless you sir🙏you’re helping so many lives💯

  • @technoboymyanmar5765
    @technoboymyanmar5765 Před 2 lety +1

    Sir please don't stop to make videos
    Your videos are actually help us🙏

  • @ParthivShah
    @ParthivShah Před 4 měsíci +1

    Thank You Sir.

  • @farhatfatima1430
    @farhatfatima1430 Před rokem +1

    Outstanding effort for everyone especially for me like new students. Plz tell me about kiras????wts date???

  • @mr.deep.
    @mr.deep. Před 2 lety +2

    Thanks

  • @huzefaghadiyali5886
    @huzefaghadiyali5886 Před 2 lety +7

    Hey, will you be covering Momentum optimization, Nesterov accelerated gradient, adaGrad, RmsProp, adam and nadam optimization in your future videos in deep learning playlist?

  • @technicalhouse9820
    @technicalhouse9820 Před 6 měsíci

    Thanku so much sir
    from Pakistan

  • @CODEToGetHer-rq2nf
    @CODEToGetHer-rq2nf Před 9 měsíci

    God level teacher ❤️🤌🏻

  • @yashjain6372
    @yashjain6372 Před rokem +1

    best

  • @Justme-dk7vm
    @Justme-dk7vm Před 2 měsíci +1

    5:00 who remembers XAVIER BHAIYA ?😂😂

  • @pragatisingh4711
    @pragatisingh4711 Před 2 lety +2

    how to join your full course because in 100 day python learning there is no 100 videos ...i want to there is any site or u only upload video in u tube not other site?...i want to do proper course?

  • @footballkheli2189
    @footballkheli2189 Před 2 lety +3

    It feels like have a runny nose. Take care brother.

  • @ParasProgramming123
    @ParasProgramming123 Před 2 lety +2

    How do you upload your data to raspberry pi or arduino. Do i need to buy heavy graphics card laptop? Or I can simply go with dell inspiron 14 5514 or hp Pavillion aero 13.

  • @ritesh_b
    @ritesh_b Před rokem

    Superb

  • @manishmaurya2365
    @manishmaurya2365 Před 2 lety +3

    Bhaiya projects pe video laao na pls 🙏🏻

  • @okfine7909
    @okfine7909 Před 2 lety +1

    Sir Plz update more videos

  • @lokeshsharma4177
    @lokeshsharma4177 Před 5 měsíci

    🙏🙏🙏🙏🙏🙏

  • @naveenpoliasetty954
    @naveenpoliasetty954 Před rokem

    Sir I am not getting better or equal results performing initialization setting the weights manually and setting them using kernel initializer