Weight Initialization explained | A way to reduce the vanishing gradient problem

SdĂ­let
VloĆŸit
  • čas pƙidĂĄn 9. 07. 2024
  • Let's talk about how the weights in an artificial neural network are initialized, how this initialization affects the training process, and what YOU can do about it!
    To kick off our discussion on weight initialization, we're first going to discuss how these weights are initialized, and how these initialized values might negatively affect the training process. We'll see that these randomly initialized weights actually contribute to the vanishing and exploding gradient problem we covered in the last video.
    With this in mind, we'll then explore what we can do to influence how this initialization occurs. We'll see how Xavier initialization (also called Glorot initialization) can help combat this problem. Then, we'll see how we can specify how the weights for a given model are initialized in code using the kernel_initializer parameter for a given layer in Keras.
    Reference to original paper by Xavier Glorot and Yoshua Bengio:
    proceedings.mlr.press/v9/gloro...
    🕒🩎 VIDEO SECTIONS 🩎🕒
    00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources
    00:30 Help deeplizard add video timestamps - See example in the description
    09:42 Collective Intelligence and the DEEPLIZARD HIVEMIND
    đŸ’„đŸŠŽ DEEPLIZARD COMMUNITY RESOURCES đŸŠŽđŸ’„
    👋 Hey, we're Chris and Mandy, the creators of deeplizard!
    👉 Check out the website for more learning material:
    🔗 deeplizard.com
    đŸ’» ENROLL TO GET DOWNLOAD ACCESS TO CODE FILES
    🔗 deeplizard.com/resources
    🧠 Support collective intelligence, join the deeplizard hivemind:
    🔗 deeplizard.com/hivemind
    🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order
    👉 Use your receipt from Neurohacker to get a discount on deeplizard courses
    🔗 neurohacker.com/shop?rfsn=648...
    👀 CHECK OUT OUR VLOG:
    🔗 / deeplizardvlog
    â€ïžđŸŠŽ Special thanks to the following polymaths of the deeplizard hivemind:
    Tammy
    Mano Prime
    Ling Li
    🚀 Boost collective intelligence by sharing this video on social media!
    👀 Follow deeplizard:
    Our vlog: / deeplizardvlog
    Facebook: / deeplizard
    Instagram: / deeplizard
    Twitter: / deeplizard
    Patreon: / deeplizard
    CZcams: / deeplizard
    🎓 Deep Learning with deeplizard:
    Deep Learning Dictionary - deeplizard.com/course/ddcpailzrd
    Deep Learning Fundamentals - deeplizard.com/course/dlcpailzrd
    Learn TensorFlow - deeplizard.com/course/tfcpailzrd
    Learn PyTorch - deeplizard.com/course/ptcpailzrd
    Natural Language Processing - deeplizard.com/course/txtcpai...
    Reinforcement Learning - deeplizard.com/course/rlcpailzrd
    Generative Adversarial Networks - deeplizard.com/course/gacpailzrd
    🎓 Other Courses:
    DL Fundamentals Classic - deeplizard.com/learn/video/gZ...
    Deep Learning Deployment - deeplizard.com/learn/video/SI...
    Data Science - deeplizard.com/learn/video/d1...
    Trading - deeplizard.com/learn/video/Zp...
    🛒 Check out products deeplizard recommends on Amazon:
    🔗 amazon.com/shop/deeplizard
    đŸŽ” deeplizard uses music by Kevin MacLeod
    🔗 / @incompetech_kmac
    ❀ Please use the knowledge gained from deeplizard content for good, not evil.

Komentáƙe • 126

  • @deeplizard
    @deeplizard  Pƙed 6 lety +11

    Machine Learning / Deep Learning Tutorials for Programmers playlist:
    czcams.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
    Keras Machine Learning / Deep Learning Tutorial playlist:
    czcams.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
    Data Science for Programming Beginners playlsit:
    czcams.com/play/PLZbbT5o_s2xo_SRS9wn9OSs_kzA9Jfz8k.html

    • @l33tc0d3
      @l33tc0d3 Pƙed 6 lety

      your explanation is great -- can you please make videos using pytorch in your channel?

    • @VinayKumar-hy6ee
      @VinayKumar-hy6ee Pƙed 5 lety

      why are the random weights normally distributed with mean of 0 what is the intuition behind it

  • @golangshorts
    @golangshorts Pƙed 4 lety +39

    God sent you to help machine learning learners.

  • @shauryr
    @shauryr Pƙed 5 lety +34

    This series is A hidden gem ! You deserve more views !! Thank you

  • @aaryannakhat1842
    @aaryannakhat1842 Pƙed 3 lety +3

    Just amazing!
    I'll always be thankful to you for providing us these astounding videos!

  • @Iamine1981
    @Iamine1981 Pƙed 3 lety +9

    You have put together a great, concise accessible series for the uninitiated in the field of Deep Machine Learning. Thank you!

  • @DannyJulian77
    @DannyJulian77 Pƙed 6 lety +4

    This video is perfect, THANK YOU SO MUCH!

  • @manjeetnagi
    @manjeetnagi Pƙed 2 lety +1

    have never come across a video which explains this concept so well. awesome!

  • @prashanthvaidya
    @prashanthvaidya Pƙed 3 lety +1

    This playlist is really good. Grateful to you for your effort! :)

  • @tymothylim6550
    @tymothylim6550 Pƙed 3 lety +1

    Thank you very much for this video! I really enjoyed learning about initialization and how it connects with everything else! Great that it can be used to tackle the vanishing/exploding gradient problem!

  • @diegodonofrio
    @diegodonofrio Pƙed 5 lety +1

    Thank you a lot for the detailed explanation!

  • @parismollo7016
    @parismollo7016 Pƙed 4 lety +1

    Amazing video as always! Thank you for your contribution to the machine learning community, it's very valuable and we learn a lot from you.

  • @torgoron9695
    @torgoron9695 Pƙed 4 lety +17

    Some explanatory remarks on the equation var(weights) = 2 / (n_in + n_out):
    The mentioned var(weights) = 1/n (or = 2/n when using ReLU) turned out to be good for the forward pass of the input (magnitude of the activations are kept approximately constant). However, with this the problem of vanishing gradients still exists during backpropagation. From the perspective of backpropagation it would be ideal to have var(weights) = 1 / n_{nextLayer}. Thus, in an article by Glorot and Bengio (2010) a compromise was found: var(weights) = 2 / (n + n_{nextLayer}), whereas n / n_{nextLayer} is the number of neurons in the layer before / after a considered neuron.
    This is from the lecture I've attended on deep neural networks.

    • @shreyajain1224
      @shreyajain1224 Pƙed 4 lety

      overall, which initalization technique must we use? formula?

    • @torgoron9695
      @torgoron9695 Pƙed 4 lety +1

      @@shreyajain1224 I'm not sure anymore. But if I remember correctly you initialize based on the normal distribution, whereas you specify the SD of the normal distribution as explained in my comment above (you of course need to transform that variance into an SD).
      Maybe @deeplizard can help?

  • @Waleed-qv8eg
    @Waleed-qv8eg Pƙed 6 lety +8

    WOOW I love the new style! PLEASE add more videos!
    your way of explanation is so clear.
    Thanks.
    KEEP IT UP!

    • @deeplizard
      @deeplizard  Pƙed 6 lety +1

      Thank you! I'm glad you're liking the new style. 😎
      New videos are coming soon. Hoping to release the next one tonight!

    • @deeplizard
      @deeplizard  Pƙed 6 lety +2

      And here we go!
      czcams.com/video/HetFihsXSys/video.html

  • @longle3928
    @longle3928 Pƙed 6 lety +1

    Thank you for the video, very clear and easy to understand

  • @AkshayRoyal
    @AkshayRoyal Pƙed 4 lety +1

    Awesome work. Keep doing it !!

  • @loneWOLF-fq7nz
    @loneWOLF-fq7nz Pƙed 5 lety +1

    Never stop teaching us !

  • @Viralvlogvideos
    @Viralvlogvideos Pƙed 4 lety +1

    Your angel to me thanks for saving my time and also for reducing my stress levels to understand the concepts

  • @fernandobaladi6636
    @fernandobaladi6636 Pƙed 2 lety +1

    Thank you very much! Now I understand better what goes on with Xavier initialization!

  • @DanielWeikert
    @DanielWeikert Pƙed 6 lety +28

    This series is really awesome. Thanks a lot for the detailed explanations. This certainly took a lot of effort so I ( and I guess everyone else who is watching and subscribing to this) highly appreciate that. Could you also cover the other layers in keras e.g. Embedding, Timedistributed,...? :)

    • @deeplizard
      @deeplizard  Pƙed 6 lety +2

      Appreciate that, Daniel! You're welcome.
      Thanks for the suggestions. I'll make sure these items are included on my list of potential topics for future videos!

    • @DanielWeikert
      @DanielWeikert Pƙed 6 lety

      Great. Thanks a lot!

  • @moritzpainz1839
    @moritzpainz1839 Pƙed 4 lety

    This youtube Channel is a blessing from god, if there is one :D. By sharing your knowledge in such a easy way, you are seriously doing so much GOOD. Thank you

  • @harishpawar2546
    @harishpawar2546 Pƙed 4 lety

    The explanation given is great. I was expecting things much in deep but that's fine. Now I got the clarity on what I need to dissect and I will definitely explore the content in this channel. Love and respect from India. Keep up the good work :)

  • @raviteja5125
    @raviteja5125 Pƙed 5 lety +2

    The explanation is so perfect and clear that there are no dislikes in the video!!. Loved your voice and way of explanation :)

  • @KatS909
    @KatS909 Pƙed 5 lety +2

    very clear explanation :D thanks!

  • @joaoramalho4107
    @joaoramalho4107 Pƙed 3 lety +1

    You are literally saving my life :)

  • @whiteF0x9091
    @whiteF0x9091 Pƙed 5 lety +1

    Great videos ! Thanks !

  • @asdfasdfuhf
    @asdfasdfuhf Pƙed 5 lety +10

    I love your voice, it's so soothing to listen to.

  • @shreyajain1224
    @shreyajain1224 Pƙed 4 lety +1

    clean and simple explanation thanks

  • @KatarDan
    @KatarDan Pƙed 6 lety +4

    You are my new hero

  • @radouanebouchou7488
    @radouanebouchou7488 Pƙed 4 lety

    Thank you very much , this is such a clear explanation , it is very helfeul .

  • @KiranSharma-ey6xp
    @KiranSharma-ey6xp Pƙed rokem +1

    Nicely explained

  • @ashwanikumarm9027
    @ashwanikumarm9027 Pƙed 2 lety

    Thanks much for the video

  • @CosmiaNebula
    @CosmiaNebula Pƙed 3 lety

    0:22 intro
    0:54 how weights matter
    2:48 bad weights can cause vanishing gradient
    4:44 heuristic for initial weight
    7:24 keras code

  • @pike1991able
    @pike1991able Pƙed 4 lety

    Great video :-) I have thoroughly enjoyed it, and I recommend it in my circle to follow your series. Out of curiosity, does vanishing gradient also depend on the activation function you choose, and if yes, what are activation functions where this issue happens less? Any suggestions based on your experience! Many thanks.

  • @sorrefly
    @sorrefly Pƙed 3 lety

    This is extremely useful
    What I think is missing is the mathematical demonstration

  • @tumul1474
    @tumul1474 Pƙed 4 lety +2

    thank you mam ! you are awesome

  • @saileshpatra2488
    @saileshpatra2488 Pƙed 4 lety +1

    Loved this❀
    Please make a video on how ResNet helps in solving the problem of vanishing and exploding gradients??????

  • @user-ps4yg3ez8n
    @user-ps4yg3ez8n Pƙed 11 měsĂ­ci

    thank you so much for this wonderful explanation. What is the default weight initialization in PyTorch?

  • @tamoorkhan3262
    @tamoorkhan3262 Pƙed 3 lety +1

    Before this insightful vid, I didn't know Keras silently doing so much for us without us even knowing. :D (y)

  • @jamesang7861
    @jamesang7861 Pƙed 3 lety +5

    don't get this..."if the desired output for our activation function is on the opposite side from where it's saturated, then during training when SGD updated the weights and attempts to influence the activation output, it will only make very small changes in the value of this activation output barely even moving it in the right direction."

    • @amithka9593
      @amithka9593 Pƙed 3 lety

      I guess it means that for some odd reason, your random initialisation of weights (while being completely random) would always trigger an activation 1, instead of 0 (In this case, 0 is what is needed) - it would mean that the network would never learn.
      In very layman terms, the random weights give rise to a standard deviation which always makes the activation fire a neuron incorrectly that what *Learning* desires.

  • @junqichen6241
    @junqichen6241 Pƙed 3 lety +1

    Thank you for this amazing content. However, I'm having trouble understand how the fact that Z can take on values significantly higher than 1 is going to make SGD very slow. I guess I'm not following this statement - "if the desired output for our activation function is on the opposite side from where it's saturated then during training when SGD updates the weights and ...". Could you clarify a little bit more on this? Thanks a lot!

  • @Max-lr6dk
    @Max-lr6dk Pƙed 4 lety +2

    But variance = sum( (x-x_mean) ^2 ) / N so it won't be 250 but 1
    And also idk var(z) = 250 it seems to be that x != 1, more like x= 1* (random number between - 1 and 1)

  • @sourabhkhandelwal689
    @sourabhkhandelwal689 Pƙed 5 lety +2

    Mind boggling, spellbound, one of the best Computer Science(not just AI/ML) channel on CZcams, hands down. I have watched your videos on Deep Reinforcement Learning and Conv Nets as well, they both are a masterpiece as well(although I wish you continued your series on Deep Reinforcement Learning, teaching other topics like Actor Critique and more stuff). And on top of that, you are funny as hell.
    Thanks for these videos.

    • @deeplizard
      @deeplizard  Pƙed 5 lety +2

      Spellbound... I like it! Thank you, Sourabh! Glad to hear you're finding value in all of the content and that you get my humor too :D Note that the RL series is still being developed with more content to come!

  • @erfanhasanpoor_zarayabi9338
    @erfanhasanpoor_zarayabi9338 Pƙed 4 lety +1

    thanks for the video, I have a question. when we multiply weights by 1/sqrt(250), the variance problem of the activation function will be solved, but, the weights have the mean around zero and standard deviation around 1/sqrt(250), so weights are smaller than one and vanishing problem can occur for the first layers derivatives, can you help me to understand this problem?

  • @udayannath7091
    @udayannath7091 Pƙed 4 lety

    If i initialize
    the weight is it mean that every time the confusion matrix is same?
    I faced a problem regarding confusion matrix as when i compile the my classification code for credit approval data set, every time it showed different confusion matrix. I fixed the randomization in the training and test set splitting.

  • @yuxiaofei3442
    @yuxiaofei3442 Pƙed 6 lety +2

    glorious

  • @yepnah3514
    @yepnah3514 Pƙed 3 lety

    How do we change the values of these weights and bias on each layer?

  • @ravirajyaguru5905
    @ravirajyaguru5905 Pƙed 3 lety

    Hi! I have been learning a lot from your awesome video explanations. so first of all, thank you for that.
    Secondly, I have a question about input_shape. At 7:30, the parameters for the input_shape = (1,5) means that input layer has 2 dimensions with 1 and 5 elements respectively. I am still not clear about that.
    Thanks in advance.

  • @aliasgarzakir4779
    @aliasgarzakir4779 Pƙed 4 lety

    Can you please explain why there is 2 instead of 1 in relu activation function?

  • @VinayKumar-hy6ee
    @VinayKumar-hy6ee Pƙed 5 lety

    why are the random weights normally distributed with mean of 0 what is the intuition behind it

  • @shreejanshrestha1931
    @shreejanshrestha1931 Pƙed 4 lety

    in input_shape(1,5) what does 1 stands for and 5 for ?

  • @JimmyCheng
    @JimmyCheng Pƙed 5 lety

    very nice video as always! just have a quick question here, if we use the glorot initializer in keras, would it adjust to the activation function we are using? For instance, 2/n or 1/n for relu or sigmoid respectively.

    • @deeplizard
      @deeplizard  Pƙed 5 lety

      Hey Ziqiang - According to their docs, it appears they've implemented glorot initialization to always use the 2 / (n_in + n_out) method mentioned at 6:38. keras.io/initializers/#glorot_normal

  • @bergamobobson9649
    @bergamobobson9649 Pƙed 3 lety

    i guess if we take the weight from a different distribution, the classical CLT will come handy to retrieve the same result as in the Xavier initialization...o am i wrong?

  • @philipharman6846
    @philipharman6846 Pƙed 3 lety

    Love your videos! What are some "symptoms" that indicate your model is being affected by vanishing gradients? Is there a way to determine this, aside from overall accuracy scores?

    • @zardouayassir7359
      @zardouayassir7359 Pƙed rokem +1

      when vanishing gradients occur, one symptom is that the weights in the first layer get less, or no, update compared to the weights in the later (deeper) layers.

  • @vijayendrasdm
    @vijayendrasdm Pƙed 4 lety

    Thanks for awesome video.
    I have couple of doubts.
    1. Suppose I am training a MLP and I have data that I divided into batches b_1,b_2...b_n.
    Lets say I do Xavier weights initialization. Now after I do back propagation over batch b_1 or after few batches
    We update the weights , which is in some way proportional to gradients. There is a good
    chance that this might distort the weights distribution (i.e not Xavier any more) . How do you guarantee the learning takes place after one/few back propagations?
    2. To avoid vanishing/Exploding our weights needs to be around 1 (i.e 0.99 or 1.001) some thing like this. In any stage of the learning this has to be the case. My question is if the above statement is true then arent we restricting the weights and is learning actually taking place ?

  • @raven5165
    @raven5165 Pƙed 4 lety

    I wonder can we say accumulated variance can contribute to vanishing gradient when we have ReLu as the activation?
    Also can it contribute to the exloding gradient?

  • @morpheus1586
    @morpheus1586 Pƙed rokem

    Whats the process when backpropagating?

  • @ashutoshshah864
    @ashutoshshah864 Pƙed 3 lety

    This new intro theme music is much better.

  • @sonalisyngal50
    @sonalisyngal50 Pƙed 5 lety +2

    Hi, great videos! I have a recommendation: Would be great if you could take us through Gradient Boosting in detail, just the way you have done with ANNs. Looking forward

  • @justchill99902
    @justchill99902 Pƙed 5 lety +2

    Hello! You kicked ass!
    Question - 3:44 "If the desired output for our activation function is on the opposite side from where it saturated, then during training when SGD updates the weights in an attempt to influence the activation output, it will only make very small changes in the value of this activation output."
    What does this mean? how is it related ?

    • @chetan22goel
      @chetan22goel Pƙed 4 lety

      I also have a similar question, what does saturation mean here. A little guidance will be really helpful.
      Thank you for creating such awesome videos.

    • @raven5165
      @raven5165 Pƙed 4 lety +1

      The slope of the sigmoid is very small when its input is either very large or very small. So its derivative would be very small and it will contribute to the vanishing gradient.
      But now in general ReLu is used far more often than sigmoid and this is not the case with ReLu.

  • @lucas.n
    @lucas.n Pƙed 5 měsĂ­ci

    3:45 i believe theres an error on the sigmoid function? Shouldnt it be: fn(x) = 1/1+e^-x ?

    • @deeplizard
      @deeplizard  Pƙed 5 měsĂ­ci

      The two expressions are equal to each other :)

  • @rafibasha4145
    @rafibasha4145 Pƙed 2 lety +1

    3:06,Please explain how we got 250 as variance

  • @bensaidlucas6563
    @bensaidlucas6563 Pƙed 3 lety

    In the Glorot initialization I don't understand why this condition should be validated to avoid vanishing/exploding gradient problem:
    "we need the gradients to have equal variance before and after flowing through a layer in the reverse direction"
    Thank you

    • @user-ep2hv7sv3q
      @user-ep2hv7sv3q Pƙed 29 dny

      I think it's because if variance increases, the gradient would have to explode to keep up and shrink if variance decreases

  • @meetayan15
    @meetayan15 Pƙed 6 lety

    by saying variance of random numbers you are actually implying variance of random variables right?
    I have one more query input_shape=(1,5) means are there 1X5= 5 input nodes?

    • @deeplizard
      @deeplizard  Pƙed 6 lety

      Hey Ayan - Yes to both questions.

    • @VinayKumar-hy6ee
      @VinayKumar-hy6ee Pƙed 5 lety

      deeplizard why are the random weights normally distributed with mean of 0 what is the intuition behind it

  • @hsa1727
    @hsa1727 Pƙed 3 lety +1

    what about "Bios" i dont get it ,,, you didnt talk about bios and im so confused ,,, is it same for tuning bios as well?

    • @im-Anarchy
      @im-Anarchy Pƙed 9 měsĂ­ci +1

      bias

    • @hsa1727
      @hsa1727 Pƙed 9 měsĂ­ci

      @@im-Anarchy helpful đŸ«€

  • @roros2512
    @roros2512 Pƙed 5 lety +1

    I donÂŽt get how var(z) = 250, some help please?
    thank you for the videos, I've been watchinh this list the last week and then I'll watch the on about Pytorch.
    thank you again for this great work, you are helping a lot of students

    • @karthikd4278
      @karthikd4278 Pƙed 5 lety +2

      Even I am not clear on this part. Can someone help us understand this.

    • @luismisanmartin98
      @luismisanmartin98 Pƙed 5 lety

      She said that in this particular case we are considering we have 250 nodes in the input layer, all with a value of 1. If this is the case, then z is the sum of all the weights. The weights are normally distributed numbers with variance 1. The variance of the sum of normally distributed numbers is the sum of the variances. Since the variance of each weight is 1, the variance of z is then 1+1+1+... as many times as there are weights. Since there are 250 nodes in the input layer, there are 250 weights and hence the variance is 1+1+1+1+...=250.

    • @skadoosh7398
      @skadoosh7398 Pƙed 4 lety

      @@luismisanmartin98 variance of each weight that is a single number should be zero right?

    • @luismisanmartin98
      @luismisanmartin98 Pƙed 4 lety

      @@skadoosh7398 what do you mean? All weights are single numbers taken from a distribution with a certain mean and variance.

    • @skadoosh7398
      @skadoosh7398 Pƙed 4 lety

      you had written that the variance of each weight is 1. By the term 'each weight', do you mean a single element of the weight matrix? for example: let the weight matrix be [1,-1], it has mean 0 and variance 1. So do you mean that variance of each weight is 1 i.e var(1)=1 and var(-1)=1. I think I am missing something important. Please clarify

  • @gamma_v1
    @gamma_v1 Pƙed 6 lety

    Audio quality is not great. My laptop's speaker volume and the CZcams volume adjuster are at the highest level, but I can barely hear the sound (to be fair there are some background noise at my place, but still the audio should be way louder)

    • @deeplizard
      @deeplizard  Pƙed 6 lety

      Thanks again for the feedback! We've been experimenting with the audio. Still tuning it trying to get it right. Check out one of the recently released videos, like the one below, and I'd appreciate your feedback on the audio quality on that one. Also, out of curiosity, are you listening through headphones? czcams.com/video/HEQDRWMK6yY/video.html

    • @gamma_v1
      @gamma_v1 Pƙed 6 lety

      That video seems better, but could be louder. I'm not listening through headphones. Thanks

    • @deeplizard
      @deeplizard  Pƙed 6 lety

      Thanks for checking it out and letting me know your thoughts!

  • @anujsharma8938
    @anujsharma8938 Pƙed 3 lety

    what is the value of node ??
    how will you define 'value of node' ??
    why does node have any value ??

    • @deeplizard
      @deeplizard  Pƙed 3 lety

      Start at the beginning of the course to have these points clarified:
      deeplizard.com/learn/playlist/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU

    • @anujsharma8938
      @anujsharma8938 Pƙed 3 lety

      @@deeplizard yes, I have started from 1st lecture of this series. What I don't get it is, the 'value of node' is input of the 'input node' or activation output of 'input node' or kind of constant in activation function

    • @deeplizard
      @deeplizard  Pƙed 3 lety

      The layer we focus on in this episode is the input layer, which consists only of the raw input data. In this case, we're saying the raw input data is a structure of length 250 with each element in that structure being of value 1.

    • @anujsharma8938
      @anujsharma8938 Pƙed 3 lety

      @@deeplizard thank u 💕💕for your patience, but
      i got all those things except this statement 'element in that structure being of value 1' i got that here element is input node and structure is the input layer but what i don't got is 'node of value 1'
      how can an input node have any value ?? is that value '1' is assumed input value or assumed output value
      what do you mean by 'input node of value 1' ??

    • @deeplizard
      @deeplizard  Pƙed 3 lety

      The input value and output value of the input layer is the same. The input layer is made up only of the raw input data that is being passed to the network. In this case, we suppose we pass input data as a structure of length 250 with each element in that structure being of value 1. This is the input data. The data flows from the input layer to the next layer via connected weights. The input to an "input node" is the same as its output. It is simply the input data. This is not the case for subsequent layers, only for the input layer.

  • @ashishkumar-fk8rh
    @ashishkumar-fk8rh Pƙed 4 lety

    Wow.... new logo effect...

  • @anandachetanelikapati6388
    @anandachetanelikapati6388 Pƙed 3 lety +1

    {
    "question": "Can you identify a way \"Keras\" is helping to quickly train a network?
    (Select the most appropriate answer)
    ",
    "choices": [
    "Keras defaults weight initializater attribute to 'glorot_uniform', which protects the variance from becoming larger.",
    "Exploiting the GPUs.",
    "Self adjusting the learning factor.",
    "Keras automatically drops few neurons to reduce the load in training."
    ],
    "answer": "Keras defaults weight initializater attribute to 'glorot_uniform', which protects the variance from becoming larger.",
    "creator": "Anandachetan Elikapati",
    "creationDate": "2020-08-19T12:01:45.585Z"
    }

    • @deeplizard
      @deeplizard  Pƙed 3 lety

      Thanks, Anandachetan! I changed the wording just a bit, but your question has now been added to deeplizard.com/learn/video/8krd5qKVw-Q :)

  • @oguzhanbaser1389
    @oguzhanbaser1389 Pƙed 4 lety

    Hey, the content is dope! sincerely,, yet I would probably prefer less distracting background than the one you've used. Thx btw, thumbs up..

    • @deeplizard
      @deeplizard  Pƙed 4 lety +1

      Agree :) We stopped using background noise during explanations in later episodes.

  • @zihanqiao2850
    @zihanqiao2850 Pƙed 4 lety +1

    A start like a game loading, a beautiful woman's voice

  • @user-kp1jr8td9f
    @user-kp1jr8td9f Pƙed 2 lety

    4:40

  • @EDeN99
    @EDeN99 Pƙed 4 lety

    Really awesome series, thank you. I must say your voice is very low in this particular episode. Please try speaking louder.

  • @paulbloemen7256
    @paulbloemen7256 Pƙed 5 lety

    For a more or less normal, medium sized, WELL TRAINED neural network:
    - What is the actual range (actual values for maximum and minimum) of the weights and the biases between input layer and first hidden layer, between last hidden layer and output layer, and between the hidden layers?
    - What is the range of means and standard deviations for the three collections of values mentioned?
    - Does it make sense to use these characteristics of weights and biases to set the random initial values for the weights and biases for the respective layers? Or has one always have to use the Xavier initialization for weights anyway, but not for biases, to allow for a sensible growth path towards the final values for weights and biases?
    I truly would appreciate an answer to these questions, thank you very much!

  • @salahfc7971
    @salahfc7971 Pƙed 3 lety

    0:31 - 0:52 -> Contents of the video.

  • @rey1242
    @rey1242 Pƙed 5 lety

    Oh
    So lets call a bunch of weights coming from ALL the neurons on the last layer into ONE neuron on the next, a SET
    So its the SET that needs to have a variance of 1/N?
    I thought the weights that needed this variance were the weights that came from ONE neuron to ALL the neurons of the next layer

    • @deeplizard
      @deeplizard  Pƙed 5 lety +1

      Hey Reyner - Yes, it's the set of weights you described in your top sentence that needs to have variance of 1/N.

    • @rey1242
      @rey1242 Pƙed 5 lety +1

      @@deeplizard thx for the enlightenment

  • @EDeN99
    @EDeN99 Pƙed 4 lety

    Something you haven't explained so far in the series; in building the first layer, the parameter "input_shape" has not been properly explained. why is it having two parameters and what does each parameter mean?
    I know it denotes our input data but how does the two parameter it takes works and when do we ignore one of them?
    Thanks in advance

  • @edilgin622
    @edilgin622 Pƙed 3 lety

    i write my own neural net so no initialization is done for me :D

  • @mayurkulkarni755
    @mayurkulkarni755 Pƙed 6 lety

    Wow a video on machine learning in the middle of advertisements

    • @deeplizard
      @deeplizard  Pƙed 6 lety +5

      A lizard's gotta eat!

    • @mayurkulkarni755
      @mayurkulkarni755 Pƙed 6 lety +1

      deeplizard yeah I can totally understand, but 3 ads in a short video really hinders what you're trying to understand, especially if it's a complex topic. You can try to follow 3b1b: they don't show ads for the first month so that users can watch ad free for first month. Also try to use CZcams SEO tools to increase viewers. I see your channel has a lot of potential. Good luck :)

    • @deeplizard
      @deeplizard  Pƙed 6 lety +1

      Hey Mayur - Thanks for the feedback and suggestions. Really appreciate it!

  • @commelephenix4702
    @commelephenix4702 Pƙed 3 lety

    Speak louder please.

  • @arnavsood2116
    @arnavsood2116 Pƙed 5 lety

    IF IT WAS POSSIBLE I WOULD MAKE YOU MY GIRL.
    PEOPLE WANT MODELS
    I WANT SOMEONE LIKE YOU WHO TEACHES YOU THOUGH LIFE