Tutorial 7- Vanishing Gradient Problem

Sdílet
Vložit
  • čas přidán 21. 07. 2019
  • Vanishing Gradient Problem occurs when we try to train a Neural Network model using Gradient based optimization techniques. Vanishing Gradient Problem was actually a major problem 10 years back to train a Deep neural Network Model due to the long training process and the degraded accuracy of the Model.
    Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
    Deep Learning Playlist: • Tutorial 1- Introducti...
    Data Science Projects playlist: • Generative Adversarial...
    NLP playlist: • Natural Language Proce...
    Statistics Playlist: • Population vs Sample i...
    Feature Engineering playlist: • Feature Engineering in...
    Computer Vision playlist: • OpenCV Installation | ...
    Data Science Interview Question playlist: • Complete Life Cycle of...
    You can buy my book on Finance with Machine Learning and Deep Learning from the below url
    amazon url: www.amazon.in/Hands-Python-Fi...
    🙏🙏🙏🙏🙏🙏🙏🙏
    YOU JUST NEED TO DO
    3 THINGS to support my channel
    LIKE
    SHARE
    &
    SUBSCRIBE
    TO MY CZcams CHANNEL

Komentáře • 195

  • @kumarpiyush2169
    @kumarpiyush2169 Před 3 lety +122

    HI Krish.. dL/dW'11= should be [dL/dO21. dO21/dO11. dO11/dW'11] +
    [dL/dO21. dO21/dO12. dO12/dW'11] as per the last chain rule illustration. Please confirm

    • @rahuldey6369
      @rahuldey6369 Před 3 lety +12

      ...but O12 is independent of W11,in that case won't the 2nd term be zero?

    • @RETHICKPAVANSE
      @RETHICKPAVANSE Před 3 lety +1

      wrong bruh

    • @ayushprakash3890
      @ayushprakash3890 Před 3 lety +2

      we don't
      have the second term

    • @Ajamitjain
      @Ajamitjain Před 3 lety +1

      Can anyone clarify this? I too have this question.

    • @grahamfernando8775
      @grahamfernando8775 Před 3 lety +29

      @@Ajamitjain dL/dW'11= should be [dL/dO21. dO21/dO11. dO11/dW'11]

  • @Xnaarkhoo
    @Xnaarkhoo Před 3 lety +15

    many years ago in the college I was enjoy watching videos from IIT - before the mooc area, India had and still have many good teachers ! It brings me joy to see that again. Seems Indians have a gene of pedagogy

  • @mahabir05
    @mahabir05 Před 4 lety +34

    I like how you explain and end your class "never give up " It very encouraging

  • @tosint
    @tosint Před 4 lety +11

    I hardly comment on videos, but this is a gem. One of the best videos explaining vanishing gradients problems.

  • @Vinay1272
    @Vinay1272 Před rokem +6

    I have been taking a well-known world-class course on AI and ML since the past 2 years and none of the lecturers have made me so interested in any topic as much as you have in this video. This is probably the first time I have sat through a 15-minute lecture without distracting myself. What I realise now is that I didn't lack motivation or interest, nor that I was lazy - I just did not have lecturers whose teaching inspired me enough to take interest in the topics, yours did.
    You have explained the vanishing gradient problem so very well and clear. It shows how strong your concepts are and how knowledgeable you are.
    Thank you for putting out your content here and sharing your knowledge with us. I am so glad I found your channel. Subscribed forever.

  • @PeyiOyelo
    @PeyiOyelo Před 4 lety +43

    Sir or As my Indian Friends say, "Sar", you are a very good teacher and thank you for explaining this topic. It makes a lot of sense. I can also see that you're very passionate however, the passion kind of makes you speed up the explanation a bit making it a bit hard to understand sometimes. I am also very guilty of this when I try to explain things that I love. Regardless, thank you very much for this and the playlist. I'm subscribed ✅

    • @amc8437
      @amc8437 Před 3 lety +3

      Consider reducing playback speed.

  • @lekjov6170
    @lekjov6170 Před 4 lety +36

    I just want to add this mathematically, the derivative of the sigmoid function can be defined as:
    *derSigmoid = x * (1-x)*
    As Krish Naik well said, we have our maximum when *x=0.5*, giving us back:
    *derSigmoid = 0.5 * (1-0.5) --------> derSigmoid = 0.25*
    That's the reason the derivative of the sigmoid function can't be higher than 0.25

    • @ektamarwaha5941
      @ektamarwaha5941 Před 4 lety

      COOL

    • @thepsych3
      @thepsych3 Před 4 lety

      cool

    • @tvfamily6210
      @tvfamily6210 Před 4 lety +13

      should be: derSigmoid(x) = Sigmoid(x)[1-Sigmoid(x)], and we know it reaches maximum at x=0. Plugging in: Sigmoid(0)=1/(1+e^(-0))=1/2=0.5, thus derSigmoid(0)=0.5*[1-0.5]=0.25

    • @benvelloor
      @benvelloor Před 3 lety

      @@tvfamily6210 Thank you!

    • @est9949
      @est9949 Před 3 lety

      I'm still confused. The weight w should be in here somewhere. This seems to be missing w.

  • @ltoco4415
    @ltoco4415 Před 4 lety +6

    Thank you sir for making this misleading concept crystal clear. Your knowledge is GOD level 🙌

  • @aidenaslam5639
    @aidenaslam5639 Před 4 lety +3

    Great stuff! Finally understand this. Also loved it when you dropped the board eraser

  • @marijatosic217
    @marijatosic217 Před 3 lety +3

    I am amazed by the level of energy you have! Thank you :)

  • @rushikeshmore8890
    @rushikeshmore8890 Před 4 lety

    Kudos sir ,am working as data analyst read lots of blogs , watched videos but today i cleared the concept . Thanks for The all stuff

  • @vikrantchouhan9908
    @vikrantchouhan9908 Před 2 lety +2

    Kudos to your genuine efforts. One needs sincere efforts to ensure that the viewers are able to understand things clearly and those efforts are visible in your videos. Kudos!!! :)

  • @gultengorhan2306
    @gultengorhan2306 Před 2 lety +1

    You are teaching better than many other people in this field.

  • @al3bda
    @al3bda Před 3 lety +1

    oh my god you are a good teacher i really fall in love how you explain and simplify things

  • @benvelloor
    @benvelloor Před 3 lety +1

    Very well explained. I can't thank you enough for clearing all my doubts!

  • @manujakothiyal3745
    @manujakothiyal3745 Před 4 lety +1

    Thank you so much. The amount of effort you put is commendable.

  • @bhavikdudhrejiya4478
    @bhavikdudhrejiya4478 Před 4 lety

    Very nice way to explain.
    Learned from this video-
    1. Getting the error (Actual Output - Model Output)^2
    2. Now We have to reduce an error i.e Backpropagation, We have to find a new weight or a new variable
    3. Finding New Weight = Old weight x Changes in the weight
    4. Change in the Weight = Learning rate x d(error / old weight)
    5. After getting a new weight is as equals to old weight due to derivate of Sigmoid ranging between 0 to 0.25 so there is no update in a new weight
    6. This is a vanishing gradient

  • @deepthic6336
    @deepthic6336 Před 4 lety

    I must say this, normally I am kinda person who prefers to study on own and crack it. Never used to listen to any of the lectures till date because I just don't understand and I dislike the way they explain without passion(not all though). But, you are a gem and I can see the passion in your lectures. You are the best Krish Naik. I appreciate it and thank you.

  • @elielberra2867
    @elielberra2867 Před rokem

    Thank you for all the effort you put into your explanations, they are very clear!

  • @koraymelihyatagan8111
    @koraymelihyatagan8111 Před 2 lety

    Thank you very much, I was wandering around the internet to find such an explanatory video.

  • @sumeetseth22
    @sumeetseth22 Před 4 lety

    Love your videos, I have watched and taken many courses but no one is as good as you

  • @piyalikarmakar5979
    @piyalikarmakar5979 Před 2 lety

    One of the best vedio on clarifying Vanishing Gradient problem..Thank you sir..

  • @MrSmarthunky
    @MrSmarthunky Před 4 lety

    Krish.. You are earning a lot of Good Karmas by posting such excellent videos. Good work!

  • @MauiRivera
    @MauiRivera Před 3 lety

    I like the way you explain things, making them easy to understand.

  • @classictremonti7997
    @classictremonti7997 Před 3 lety

    So happy I found this channel! I would have cried if I found it and it was given in Hindi (or any other language than English)!!!!!

  • @himanshubhusanrath2492

    One of the best explanations of vanishing gradient problem. Thank you so much @KrishNaik

  • @sapnilpatel1645
    @sapnilpatel1645 Před rokem +1

    so far best explanation about vanishing gradient.

  • @classictremonti7997
    @classictremonti7997 Před 3 lety

    Krish...you rock brother!! Keep up the amazing work!

  • @venkatshan4050
    @venkatshan4050 Před 2 lety +1

    Marana mass explanation🔥🔥. Simple and very clearly said.

  • @satyadeepbehera2841
    @satyadeepbehera2841 Před 4 lety +3

    Appreciate your way of teaching which answers fundamental questions.. This "derivative of sigmoid ranging from 0 to 0.25" concept was nowhere mentioned.. thanks for clearing the basics...

  • @MsRAJDIP
    @MsRAJDIP Před 4 lety +2

    Tommorow I have interview, clearing all my doubts from all your videos 😊

  • @swapwill
    @swapwill Před 4 lety

    The way you explain is just awesome

  • @mittalparikh6252
    @mittalparikh6252 Před 3 lety +1

    Overall got the idea, that you are trying to convey. Great work

  • @skiran5129
    @skiran5129 Před 2 lety

    I'm lucky to see this wonderful class.. Tq..

  • @adityashewale7983
    @adityashewale7983 Před 11 měsíci

    hats off to you sir,Your explanation is top level, THnak you so much for guiding us...

  • @b0nnibell_
    @b0nnibell_ Před 4 lety

    you sir made neural network so much fun!

  • @meanuj1
    @meanuj1 Před 5 lety +4

    Nice presentation..so much helpful...

  • @vishaljhaveri6176
    @vishaljhaveri6176 Před 2 lety

    Thank you, Krish SIr. Nice explanation.

  • @hiteshyerekar9810
    @hiteshyerekar9810 Před 5 lety +4

    Nice video Krish.Please make practicle based video on gradient decent,CNN,RNN.

  • @nabeelhasan6593
    @nabeelhasan6593 Před 2 lety

    Very nice video sir , you explained very well the inner intricacies of this problem

  • @daniele5540
    @daniele5540 Před 4 lety +1

    Great tutorial man! Thank you!

  • @sunnysavita9071
    @sunnysavita9071 Před 4 lety

    your videos are very helpful ,good job and good work keep it up...

  • @YashSharma-es3lr
    @YashSharma-es3lr Před 3 lety

    very simple and nice explanation . I understand it in first time only

  • @benoitmialet9842
    @benoitmialet9842 Před 2 lety +1

    Thank you so much, great quality content.

  • @shmoqe
    @shmoqe Před 2 lety

    Great explanation, Thank you!

  • @yoyomemory6825
    @yoyomemory6825 Před 3 lety +1

    Very clear explanation, thanks for the upload.. :)

  • @yousufborno3875
    @yousufborno3875 Před 4 lety

    You should get Oscar for your teaching skills.

  • @nola8028
    @nola8028 Před 2 lety

    You just earned a +1 subscriber ^_^
    Thank you very much for the clear and educative video

  • @naresh8198
    @naresh8198 Před rokem

    crystal clear explanation !

  • @skviknesh
    @skviknesh Před 3 lety +1

    I understood it. Thanks for the great tutorial!
    My query is:
    weight vanishes when respect to more layers. When new weight ~= old weight result becomes useless.
    what would the O/P of that model look like (or) will we even achieve global minima??

  • @narayanjha3488
    @narayanjha3488 Před 4 lety +1

    This video is amazing and you are amazing teacher thanks for sharing such amazing information
    Btw where are you from banglore?

  • @faribataghinezhad3293
    @faribataghinezhad3293 Před 2 lety

    Thank you sir for your amazing video. that was great for me.

  • @nazgulzholmagambetova1198

    great video! thank you so much!

  • @aishwaryaharidas2100
    @aishwaryaharidas2100 Před 4 lety

    Should we again add bias to the product of the output from the hidden layer O11, O12 and weights W4, W5?

  • @maheshsonawane8737
    @maheshsonawane8737 Před rokem

    Very nice now i understand why weights doesn't update in RNN. The main point is derivative of sigmoid is between 0 and 0.25. Vanishing gradient is associated with only sigmoid function. 👋👋👋👋👋👋👋👋👋👋👋👋

  • @anandemani5472
    @anandemani5472 Před 4 lety

    Hi Krish,Can we declare convergence when the weights are decreasing less than 0.0001?

  • @Joe-tk8cx
    @Joe-tk8cx Před rokem

    Great video, one question, when you calculate the new weights using the old weight - learning rate x derivative of loss with respect to weight, the derivative of loss wrt weight is that the sigmoid function ?

  • @susmitvengurlekar
    @susmitvengurlekar Před 3 lety

    Understood completely! If weights hardly change, no point in training and training. But I have got a question, where can I use this knowledge and understanding I just acquired ?

  • @shahidabbas9448
    @shahidabbas9448 Před 4 lety +1

    Sir i'm really confusing about the actual y value please can you tell about that. i thought it would be our input value but here input value is so many with one predicted
    output

  • @arunmeghani1667
    @arunmeghani1667 Před 3 lety

    great video and great explanation

  • @tonnysaha7676
    @tonnysaha7676 Před 3 lety

    Thank you thank you thank you sir infinite times🙏.

  • @BalaguruGupta
    @BalaguruGupta Před 3 lety

    Thanks a lot sir for the wonderful explanation :)

  • @neelanshuchoudhary536
    @neelanshuchoudhary536 Před 4 lety +1

    very nice explanation,,great :)

  • @nirmalroy1738
    @nirmalroy1738 Před 4 lety

    super video...extremely well explained.

  • @magicalflute
    @magicalflute Před 4 lety

    Very well explained. Vanishing gradient problem as per my understanding is that, it is not able to perform the optimizer job (to reduce the loss) as old weight and new weights will be almost equal. Please correct me, if i am wrong. Thanks!!

  • @krishj8011
    @krishj8011 Před 3 lety

    Very nice series... 👍

  • @nikunjlahoti9704
    @nikunjlahoti9704 Před rokem

    Great Lecture

  • @melikad2768
    @melikad2768 Před 3 lety +1

    Thank youuuu, its really great:)

  • @AdarshSingh-nb2ql
    @AdarshSingh-nb2ql Před 2 lety

    I have one doubt, if we use sigmoid only in the last layer, due to multiple back and forth propagation, won't that minimize the derivative of loss function to 0 - 0.25

  • @prerakchoksi2379
    @prerakchoksi2379 Před 4 lety +1

    I am doing deep learning specialization, feeling that this is much better than that

  • @muhammadarslankahloon7519

    Hello sir, why the chain rule explained in this video is different from the very last chain rule video. kindly clearly me and thanks for such an amazing series on deep learning.

  • @ambreenfatimah194
    @ambreenfatimah194 Před 3 lety

    Helped a lot....thanks

  • @narsingh2801
    @narsingh2801 Před 4 lety

    You are just amazing. Thnx

  • @sunnysavita9071
    @sunnysavita9071 Před 4 lety

    very good explanation.

  • @abdulqadar9580
    @abdulqadar9580 Před 2 lety

    Great efforts Sir

  • @GunjanGrunge
    @GunjanGrunge Před 2 lety

    that was very well explained

  • @sekharpink
    @sekharpink Před 5 lety +33

    Derivative of loss with respect to w11 dash you specified incorrectly, u missed derivative of loss with respect to o21 in the equation. Please correct me if iam wrong.

    • @sekharpink
      @sekharpink Před 5 lety

      Please reply

    • @ramleo1461
      @ramleo1461 Před 4 lety

      Evn I hv this doubt

    • @krishnaik06
      @krishnaik06  Před 4 lety +28

      Apologies for the delay...I just checked the video and yes I have missed that part.

    • @ramleo1461
      @ramleo1461 Před 4 lety +12

      @@krishnaik06Hey!,
      U dnt hv to apologise, on the contrary u r dng us a favour by uploading these useful videos, I was a bit confused and wanted to clear my doubt that all, thank you for the videos... Keep up the good work!!

    • @rajatchakraborty2058
      @rajatchakraborty2058 Před 4 lety

      @@krishnaik06 I think you have also missed the w12 part in the derivative. Please correct me if I am wrong

  • @ngelospapoutsis9389
    @ngelospapoutsis9389 Před 4 lety

    so if we have 2 layers and as we know 1 forward and back step is 1 epoch. If we now have 100 epochs the derivative is going to get smaller every time? Or the vanishing problem is due to many hidden layers and not
    depended on the number of epochs?

  • @varayush
    @varayush Před 3 lety

    @krish: thanks for the wonderful lessons on the neural network. may I request you to correct the equation using some text box on the video as this will have intact information that you would like to pass on

  • @khiderbillal9961
    @khiderbillal9961 Před 3 lety +1

    thanks sir you really hepled me

  • @dhananjayrawat317
    @dhananjayrawat317 Před 4 lety

    best explanation. Thanks man

  • @abhinavkaushik6817
    @abhinavkaushik6817 Před 2 lety

    Thank you so much for this

  • @manikosuru5712
    @manikosuru5712 Před 5 lety +4

    As usual extremely good outstanding...
    And a small request can expect this DP in coding(python) in future??

  • @gowthamprabhu122
    @gowthamprabhu122 Před 4 lety +1

    Can someone please explain why the derivative of each parent layer reduces ? i.e why does layer two have lower derivative of O/P with respect to its I/P?

  • @_jiwi2674
    @_jiwi2674 Před 3 lety

    you meant that the derivative of the sigmoid is between 0 and 0.25, right? I wanted to clarify about that range written in red color. The sigmoid of z would be between 0 and 1, from what I understood. Any reply will be appreciated :)

  • @sandipansarkar9211
    @sandipansarkar9211 Před 4 lety

    Thanks krish .Video was superb but I am having apprehension I might get lost somewhere .Please provide some reading reference regrading this topic considering as a beginner.Cheers

  • @salimtheone
    @salimtheone Před rokem

    very well explained 100/100

  • @gaurawbhalekar2006
    @gaurawbhalekar2006 Před 4 lety

    excellent explanation sir

  • @hokapokas
    @hokapokas Před 5 lety +4

    Good job bro as usual... Keep up the good work.. I had a request of making a video on implementing back propagation. Please make a video for it.

    • @krishnaik06
      @krishnaik06  Před 5 lety +1

      Already the video has been made.please have a look on my deep learning playlist

    • @hokapokas
      @hokapokas Před 5 lety

      @@krishnaik06 I have seen that video but it's not implemented in python.. If you have a notebook you can refer me to pls

    • @krishnaik06
      @krishnaik06  Před 5 lety +3

      With respect to implementation with python please wait till I upload some more videos

  • @rahuldey6369
    @rahuldey6369 Před 3 lety

    I'm a bit confused whether 'O's are weighted sums or activation of the weighted sums. If they are the activation of the weighted sums say 'a' and the weighted sums be 'z', then won't it be like- (dL/dw= dL/da * da/dz * dz/dw)

  • @Haraharavlogs
    @Haraharavlogs Před 5 měsíci

    you are legend nayak sir

  • @aaryankangte6734
    @aaryankangte6734 Před 2 lety

    Sir thank u for teaching us all the concepts from basics but just one request is that if there is a mistake in ur videos then pls rectify it as it confuses a lot of people who watch these videos as not everyone sees the comment section and they just blindly belive what u say. Therefore pls look into this.

  • @Thriver21
    @Thriver21 Před rokem

    nice explanation.

  • @AnirbanDasgupta
    @AnirbanDasgupta Před 3 lety

    excellent video

  • @AA-yk8zi
    @AA-yk8zi Před 3 lety

    Thank you so much

  • @feeham
    @feeham Před 2 lety

    Thank you !!

  • @amitdebnath2207
    @amitdebnath2207 Před měsícem

    Hats Off Brother

  • @ArthurCor-ts2bg
    @ArthurCor-ts2bg Před 4 lety

    Excellent 👌

  • @gautam1940
    @gautam1940 Před 4 lety +3

    This is an interesting fact to know. Makes me curious to see how ReLU overcame this problem

  • @gopalakrishna9510
    @gopalakrishna9510 Před 4 lety

    on what basis no's of hiden layers will be create ?