Tutorial 6-Chain Rule of Differentiation with BackPropagation

Sdílet
Vložit
  • čas přidán 18. 07. 2019
  • In this video we will discuss about the chain rule of differentiation which is the basic building block in BackPropagation.
    Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
    Complete Deep Learning: • Tutorial 1- Introducti...
    Data Science Projects playlist: • Generative Adversarial...
    NLP playlist: • Natural Language Proce...
    Statistics Playlist: • Population vs Sample i...
    Feature Engineering playlist: • Feature Engineering in...
    Computer Vision playlist: • OpenCV Installation | ...
    Data Science Interview Question playlist: • Complete Life Cycle of...
    You can buy my book on Finance with Machine Learning and Deep Learning from the below url
    amazon url: www.amazon.in/Hands-Python-Fi...
    🙏🙏🙏🙏🙏🙏🙏🙏
    YOU JUST NEED TO DO
    3 MAGICAL THINGS
    LIKE
    SHARE
    &
    SUBSCRIBE
    TO MY CZcams CHANNEL
    📚📚📚📚📚📚📚📚

Komentáře • 240

  • @debtanudatta6398
    @debtanudatta6398 Před 3 lety +183

    Hello Sir, I think there is mistake in this video for backpropagation. Basically to find out (del L)/(del (w11^2)), we don't need the PLUS part. Since here O22 doesn't depend on w11^2. Please look into that. The PLUS part will be needed while calculating (del L)/(del (w11^1)), there O21 & O22 both depend on O11 and O11 depends on w11^1.

    • @alinawaz8147
      @alinawaz8147 Před 2 lety +2

      Yes brother there is mistake what is said is correct

    • @prakharagrawal4011
      @prakharagrawal4011 Před 2 lety +2

      Yes, This is correct. Thank you for pointing this out.

    • @aaryankangte6734
      @aaryankangte6734 Před 2 lety +2

      true that

    • @vegeta171
      @vegeta171 Před rokem +1

      You are correct concerning that, but I think he wanted to take derivative w.r.t O11 since it is present in both nodes of f21 and f22, so if we replace w11^2 in the equation by O11 the equation would be correct

    • @byiringirooscar321
      @byiringirooscar321 Před rokem +1

      it took me time to understand it but now I got the point thanks man but I can assure you that @krish naik is the first professor I have

  • @OMPRAKASH-uz8jw
    @OMPRAKASH-uz8jw Před rokem +2

    you are no one but the perfect teacher,keep on adding playlist

  • @ksoftqatutorials9251
    @ksoftqatutorials9251 Před 5 lety +5

    I don't want to calulate Loss function to your videos and no need to propagate the video back and forward i.e you explained in such a easiest way I have ever seen in others. Keep doing more and looking forward to learn more from you. Thanks a ton.

  • @VVV-wx3ui
    @VVV-wx3ui Před 4 lety

    This is simply yet Superbly explained. When I learnt earlier, it stopped at Back Propagation. Now, learnt what is in Backpropagation that makes the Weights updation in an appropriate way, i.e., Chain rule. Thanks much for giving clarity that is easy to understand. Superb.

  • @abhishek-shrm
    @abhishek-shrm Před 4 lety +1

    This video explained everything I needed to know about backpropagation. Great video sir.

  • @aj_actuarial_ca
    @aj_actuarial_ca Před rokem +1

    Your videos are really helping me to learn Machine learning as an actuarial student who is from a pure commerce/ finance background

  • @manateluguabbaiinuk-mahanu761

    Deep Learning Playlist concepts are very clear and anyone can understand easily. Really have to appreciate your efforts 👏🙏

  • @tarun4705
    @tarun4705 Před rokem +3

    This is the most clear mathematical explanation I have ever seen till now.

    • @moksh5743
      @moksh5743 Před 7 měsíci

      czcams.com/video/Ixl3nykKG9M/video.html

  • @akumatyy
    @akumatyy Před 3 lety +9

    Jabardast sir, i am watching ur videos after watching Andrew Ng's lecture of deep learning. I will say you simply explained even more easily. Superb.

  • @ganeshvhatkar9040
    @ganeshvhatkar9040 Před 4 měsíci +1

    one of the best videos, I have seen in my life!!

  • @rajeeevranjan6991
    @rajeeevranjan6991 Před 4 lety +6

    simply one word "Great"

  • @mranaljadhav8259
    @mranaljadhav8259 Před 4 lety +1

    Well Explained sir ! Before starting the deep learning, I have decided to start the learning from your videos. You explain in very simple way ...Anyone can understand from your video. Keep it up Sir :)

  • @nishitnishikant8548
    @nishitnishikant8548 Před 2 lety +44

    Of the two connections from f11 to the second hidden layer, w11^2 is affecting only f21 and not f22(as it affected by w21^2). So, dL/dw11^2 will only have one term instead of two.
    Anyone, pls correct me if i am wrong.

    • @sahilvohra8892
      @sahilvohra8892 Před 2 lety +3

      I agree. i dont know why others didn't realized this same mistake!!!

    • @mustaphaelammari1128
      @mustaphaelammari1128 Před 2 lety +3

      i agree, i was looking for someone has the same remark :)

    • @ismailhossain5114
      @ismailhossain5114 Před 2 lety +3

      That's the point I am actually looking

    • @saqueebabdullah9142
      @saqueebabdullah9142 Před 2 lety +4

      Exactly, cause if I solve the derivative of two terms it results d/dw11^2 *L = d/dw11^2 *L + d/dw12^2 *L , which is wrong

    • @RUBAYATKHAN89
      @RUBAYATKHAN89 Před 2 lety +3

      Absolutely.

  • @AmitYadav-ig8yt
    @AmitYadav-ig8yt Před 4 lety +30

    It has been years since I had solved any mathematics question paper or looked at mathematics book. But the way you explained was damn good than Ph.D. holder professors at the University. I did not feel my away from mathematics at all. LoL- I do not understand my professors but understand you perfectly

  • @RomeshBorawake
    @RomeshBorawake Před 3 lety +18

    Thank you for the perfect DL Playlist to learn, wanted to highlight a change to make it 100% useful (Already at 99.99%),
    13:04 - For Every Epoch, the Loss Decreases adjusting according to the Global Minima.

    • @vishnukce
      @vishnukce Před 8 měsíci

      But for negative slopes loss has to increase know to reach global maxima

  • @aditideepak8033
    @aditideepak8033 Před 3 lety +1

    You have explained it very well. Thanks a lot!

  • @shrutiiyer68
    @shrutiiyer68 Před 3 lety +1

    Thank you so much for all your efforts to give such an easy explanation🙏

  • @someshanand1799
    @someshanand1799 Před 3 lety +1

    great video especially you are giving the concept behind it, love it.. thank you for sharing with us.

  • @MrityunjayD
    @MrityunjayD Před 3 lety

    Really appreciable the way you taught Chain rule...awesome..

  • @varunsharma1331
    @varunsharma1331 Před 11 měsíci

    Great explanation. I was looking for this clarity since long...

  • @manjunath.c2944
    @manjunath.c2944 Před 4 lety +1

    clearly understood very much appreciated for your effort :)

  • @ZIgoTTo10000
    @ZIgoTTo10000 Před 2 lety

    You have saved my life, i owe you everything

  • @manikosuru5712
    @manikosuru5712 Před 5 lety +1

    Amazing Videos...Only one word to say "Fan"

  • @saritagautam9328
    @saritagautam9328 Před 3 lety

    This is really cool. First time samjh aaya. Hats off Man.

  • @armanporwal4032
    @armanporwal4032 Před 4 lety +2

    OP... Nice Teaching... Why don't we get teachers like u in every institute and college??

  • @channel8048
    @channel8048 Před rokem

    Thank you so much for this! You are a good teacher

  • @adityashewale7983
    @adityashewale7983 Před 11 měsíci

    hats off to you sir,Your explanation is top level, THnak you so much for guiding us...

  • @skviknesh
    @skviknesh Před 3 lety +1

    Thanks ! That was really awesome.

  • @hashimhafeez21
    @hashimhafeez21 Před 3 lety

    first time i undestand very well by your explanation.

  • @devgak7367
    @devgak7367 Před 4 lety

    Just awsome explanation of gradient descent.

  • @uddalakmitra1084
    @uddalakmitra1084 Před 2 lety

    Excellent presentation Krish Sir .. You are great

  • @sandeepganage9717
    @sandeepganage9717 Před 4 lety

    Brilliant explanation!

  • @mohammedsaif3922
    @mohammedsaif3922 Před 3 lety

    Krish your awesome finally I understood the chain rule from you thanks Krish again

  • @chandanbp
    @chandanbp Před 4 lety

    Great stuff for free. Kudos to you and your channel

  • @vishalshukla2happy
    @vishalshukla2happy Před 4 lety +1

    Great way to explain man.... keep on going

  • @aminzaiwardak6750
    @aminzaiwardak6750 Před 4 lety +1

    thank you sir, you explain very good keep it up.

  • @tanvirantu6623
    @tanvirantu6623 Před 3 lety

    love you sir, love ur effort. love from Bangladesh.

  • @deepaktiwari9854
    @deepaktiwari9854 Před 3 lety +12

    Nice informative video. It helped me in understanding the concept. But i think at end there is a mistake. You should not add the other path to calculate the derivative for W11^2. Addition should be done if we are calculating the derivative for O11.
    w11^2(new) = (dl/dO31 * dO31/dO21 * dO21/dW11^2)

    • @grownupgaming
      @grownupgaming Před 2 lety

      Yes deepak, I noticed the same thing. There's a mistake around 12:21. no addition is needed.

    • @anupampurkait6066
      @anupampurkait6066 Před 2 lety

      yes deepak you are correct. I also think the same.

    • @albertmichaelofficial8144
      @albertmichaelofficial8144 Před 11 měsíci

      Is that because we are calculating based on o3 and 03 depends on both output from second layer

  • @maheshvardhan1851
    @maheshvardhan1851 Před 5 lety +2

    great effort...

  • @tintintintin576
    @tintintintin576 Před 4 lety

    so helpful video :)
    thanks

  • @yedukondaluannangi7351

    Thanks a lot for the videos it helped me a lot

  • @dnakhawa
    @dnakhawa Před 4 lety

    You are too Good Krish , nice Data science content

  • @sekharpink
    @sekharpink Před 5 lety +2

    Very very good explanation..very much understandable. Can I know how many days ur planning to complete this entire playlist?

  • @saygnileri1571
    @saygnileri1571 Před 2 lety

    Nice one thnks a lot!

  • @ZaChaudhry
    @ZaChaudhry Před rokem

    ❤. God bless you, Sir.

  • @kamranshabbir2734
    @kamranshabbir2734 Před 5 lety +14

    the last partial derivative of Loss we have calculated w.r.t. (w11^2) is that correct how we have shown there that it is dependent upon two paths one w11^2 and other w12^2 ......... Please make it clear i am confused about it ??????

    • @wakeupps
      @wakeupps Před 4 lety +11

      I think this is wrong! Maybe he wanted to discuss about the w11^1? However, a forth term should be add in the sum. Idk

    • @imranuddin5526
      @imranuddin5526 Před 4 lety +1

      @@wakeupps yes, i think he got confused and it was w11^1

    • @Ip_man22
      @Ip_man22 Před 4 lety +4

      assume he is explaining about W11^1 and youll understand everything. From the diagram itself, you can see the connections and can clearly imagine which weights are dependent on each other .
      Hope this helps

    • @akrsrivastava
      @akrsrivastava Před 4 lety +4

      Yes, he should not have added the second term in the summation.

    • @gouravdidwania1070
      @gouravdidwania1070 Před 2 lety

      @@akrsrivastava Correct no second term needed for W11^2

  • @sundara2557
    @sundara2557 Před 4 lety

    I am going through tour videos. You are Rocking Bro.

  • @aravindvarma5679
    @aravindvarma5679 Před 4 lety

    Thanks Krish...

  • @meanuj1
    @meanuj1 Před 5 lety +1

    Nice and requested to please add some videos on optimizer...

  • @punyanaik52
    @punyanaik52 Před 4 lety +15

    Bro, there is a correction needed in this video... watch out for last 3 mins and correct the mistake. Thanks for your efforts

  • @vishaljhaveri6176
    @vishaljhaveri6176 Před 2 lety

    Thank you sir.

  • @good114
    @good114 Před 2 lety +1

    Thank you Sir 🙏🙏🙏🙏♥️☺️♥️

  • @pranjalbahore6983
    @pranjalbahore6983 Před 2 lety

    so insightful @krish

  • @arpitdas2530
    @arpitdas2530 Před 4 lety +2

    Your teaching is great sir. But can we get some video also about how we will apply these practically in python?

  • @shashireddy7371
    @shashireddy7371 Před 4 lety

    Well explained video

  • @hokapokas
    @hokapokas Před 5 lety +1

    Loved it man... Great effort in explaining the maths behind it and chain rule. Pls make a video on its implementation soon. as usual great work.. Looking forward for the videos. Cheers

    • @shivamjalotra7919
      @shivamjalotra7919 Před 4 lety +1

      Hello Sunny, I myself have stitched an absolutely brilliant repository explaining all the implementation details behind an ANN. See this: github.com/jalotra/Neural_Network_From_Scratch

    • @kshitijzutshi
      @kshitijzutshi Před 2 lety

      @@shivamjalotra7919 Great effort. Starred it. ⭐👍🏼

    • @shivamjalotra7919
      @shivamjalotra7919 Před 2 lety +1

      @@kshitijzutshi try to implement it yourself from scratch. See george hotz twitch stream for this.

    • @kshitijzutshi
      @kshitijzutshi Před 2 lety

      @@shivamjalotra7919 Any recommendation for understanding image segmentation problem using CNN? resources?

  • @camilogonzalezcabrales2227

    Excellent video, I'm new in the field, could someone explain me how the O's are obtained. Are that O's the result of each neuron computation? are the O's numbers equations?

  • @quranicscience9631
    @quranicscience9631 Před 4 lety

    very good content

  • @sivaveeramallu3645
    @sivaveeramallu3645 Před 4 lety

    excellent Krish

  • @ruchikalalit1304
    @ruchikalalit1304 Před 4 lety +8

    @ 10:28 - 11:22 krish do we need both the paths to get added . since w11 suffix 2 is not affected by lower path ie w12 suffix 2? please tell

    • @amit_sinha
      @amit_sinha Před 4 lety +2

      The second part of the summation should not come in the picture as it will come only when we will be calculating (dL/dw12) with suffix as 2.

    • @latifbhanger
      @latifbhanger Před 4 lety

      @@amit_sinha i think that is correct.

    • @niteshhebbare3339
      @niteshhebbare3339 Před 3 lety

      @@amit_sinha
      Yes I have the same doubt!

    • @vishaldas6346
      @vishaldas6346 Před 3 lety +1

      Not required, its not correct as w11^2 is not affected by lower weights. The 1st part is correct and summation is required , when we are thinking about w11^1.

    • @grownupgaming
      @grownupgaming Před 2 lety

      @@vishaldas6346 Yes!

  • @cynthiamoricordova5099

    Thank you so much for all your videos. I have a question respect of the value to assign to bias. This value is a random value? I will appreciate your answer.

  • @tobiasfan5407
    @tobiasfan5407 Před 10 měsíci

    thank you ser

  • @rajshekharrakshit9058
    @rajshekharrakshit9058 Před 3 lety +1

    sir i think one thing you are doing is worng.
    as w^(3)11 impacts O(31) , here is one activation part.
    so the dL/dw^(3)11 = dL/dO(31) . d0(31)/df1 . df1/dw^(3)11
    I might be wrong, can you please clear my query ?

  • @ga43ga54
    @ga43ga54 Před 5 lety +2

    Can you please do a Live Q&A session !? Great video... Thank you

    • @krishnaik06
      @krishnaik06  Před 5 lety +3

      Let me upload some more videos, then I will do a Live Q&A session.

  • @enquiryadmin8326
    @enquiryadmin8326 Před 4 lety

    in the back propagation, calculation of gradients using the chain rule for the w11^1, i think we need to consider 6 paths. please kindly clarify.

  • @pranjalgupta9427
    @pranjalgupta9427 Před 2 lety +1

    Nice 👍👏🥰

  • @viveksm863
    @viveksm863 Před 3 lety +1

    Im able to understand the concepts you are explaining, but I dont know that from where do we get values for weights in forward propgation.Could you brief about that once if possible.

  • @utkarshashinde9167
    @utkarshashinde9167 Před 3 lety

    Sir , If to every single neuron in hidden layer we are giving same weights and features with bias then what is the use of multiple neurons in single layer?

  • @sekharpink
    @sekharpink Před 5 lety +1

    Hi Krish,
    Please upload videos on regular basis. I'm eagerly waiting for your videos.
    Thanks in Advance

    • @krishnaik06
      @krishnaik06  Před 5 lety +2

      Uploaded please check the tutorial 7

    • @sekharpink
      @sekharpink Před 5 lety

      @@krishnaik06 thank you..please keep posting more videos..I'm really waiting to watch your videos..really liked your way of explanation

  • @sapito169
    @sapito169 Před rokem

    finally i understand it

  • @louerleseigneur4532
    @louerleseigneur4532 Před 3 lety

    thanks sir

  • @saitejakandra5640
    @saitejakandra5640 Před 5 lety +3

    Pls upload ROC auc related concepts

  • @latifbhanger
    @latifbhanger Před 4 lety +3

    Awesome Mate. however, I think you got carried away for the second part to be added. read the comments below and correct, please. W12 may not need to be added. But it all makes sense. A very good explanation.

  • @dipankarrahuldey6249
    @dipankarrahuldey6249 Před 3 lety +4

    I think this part dL/dw11^2 should be (dL/dO31 *dO31/O21 *dO21/dO11^2). If we are taking derivative of dL w.r.t w11^2 then,w12^2 doesn't come into play. So,in that case, dL/dO12^2= (dL/dO31 *dO31/O22 *dO22/dw12^2)

    • @raj4624
      @raj4624 Před 2 lety

      agree...dw11^2 should be (dL/dO31 *dO31/O21 *dO21/dO11^2). not extra afte addition

  • @tabilyst
    @tabilyst Před 3 lety

    Hi Krish, can you pls let me know, if we are calculating the derivative of W2 11 weight then why we are adding derivative of W2 12 weight in that. ? pls clear

  • @sandipansarkar9211
    @sandipansarkar9211 Před 4 lety

    yeah I did understand chain rule but being a fresher please provide some easy to study articles on chain rule so that i can increase my understanding before proceeding further.

  • @yuvi12
    @yuvi12 Před 4 lety

    but sir, In other source of internet, they are showing a different loss function. which 1 would i believe?

  • @mikelrecacoechea8730
    @mikelrecacoechea8730 Před 2 lety

    Hey Krish, god explanation
    I think there is one correction. In the end, you explained for w11^2, what I feel is, it is for w11^1.

  • @jontyroy1723
    @jontyroy1723 Před 11 měsíci

    In the step where dL/dw[2]11 was shown as addition of two separate chain rule outputs, should it not be dL/dw[2]1 ?

  • @omkarpatil2854
    @omkarpatil2854 Před 4 lety +3

    thank you for great explanation,
    i have a question, with this formula which generates for ( diff(L) / diff (W11)) is completely same for ( diff(L) / diff (W12))
    i am i right? does both value gets same difference in weights while back propagation ( though W old value will be different

    • @SunnyKumar-tj2cy
      @SunnyKumar-tj2cy Před 4 lety

      Same question.
      What I think, as we are finding out the new weights, the W11 and W12 for HL2, both should be different and should not be added, or I am missing something.

    • @abhinaspadhi8351
      @abhinaspadhi8351 Před 4 lety

      @@SunnyKumar-tj2cy Yeah, Both should not be added as they are diff...

    • @spurthygopal1239
      @spurthygopal1239 Před 4 lety

      Yes i have same question too!

    • @varunmanjunath6204
      @varunmanjunath6204 Před 3 lety

      @@abhinaspadhi8351 its wrong

  • @chaitanyakumarsomagani592

    krish sir, is it w12^2 is depends on w11^2 then only we can do differentiation. w12^2 is going one way and w11^2 is going another way.

  • @Skandawin78
    @Skandawin78 Před 4 lety

    Do u update the bias during backpropagation along with weights? Or does it remain constant after the initialization?

  • @Pink_Bear_
    @Pink_Bear_ Před rokem +1

    here we used optimizer to update the weight slope is dl/dw so w here is w_old or something else.

  • @gunjanagrawal8626
    @gunjanagrawal8626 Před rokem +1

    Could you please recheck the video at around 11:00, W11 weight updation should be independent of W12.

  • @hafi029
    @hafi029 Před 3 lety

    a doubt in dL/dw11 is that correct?? we need to add?

  • @pratikchakane5148
    @pratikchakane5148 Před 4 lety

    If we are calculating the updated weight of W11^2 then why we need to add the weight W12^2 ?

  • @pratikgudsurkar8892
    @pratikgudsurkar8892 Před 4 lety +2

    We are solving supervised learning problem that's why we have loss as actual-predicted , what in case of unsupervised where we don't have y actual how the loss is calculated and how the updation happen

    • @benvelloor
      @benvelloor Před 3 lety

      I don't think there will be back propogation in unsupervised learning!

  • @waynewu7763
    @waynewu7763 Před 4 dny

    how do you take the derivative of d(O31)/dO21? what kind of equations are those?

  • @tobiasfan5407
    @tobiasfan5407 Před 10 měsíci

    subscribed

  • @nikhilramabadran2959
    @nikhilramabadran2959 Před 3 lety

    for calculating the loss function wrt W112 why do you also consider the other branch leading to the output ?? Kindly reply

    • @nikhilramabadran2959
      @nikhilramabadran2959 Před 3 lety

      it's mentioned clearly that it's wrt only W112 - the reason I'm asking this question

  • @shindepratibha31
    @shindepratibha31 Před 3 lety

    Hey Krish, your way of explanation is good.
    I think there is one correction. In the end, you explained for w11^2, what I feel is, it is for w11^1. It would be really helpful if you correct it because many are getting confused with it.

    • @aneeshkalita7452
      @aneeshkalita7452 Před rokem

      I think the same.. But great method of teaching.. there is no doubting that

  • @bsivarahulreddy
    @bsivarahulreddy Před 3 lety

    Sir, O31 is also impacted by weight W11(3) ryt? why we are not taking that derivative in chain rule?

  • @jpovando25
    @jpovando25 Před 4 lety

    Hola. Sabes Redes nueronales (Neural networks) utilizando el software Statistica?

  • @kumarsaimirthipati8173

    Please tell me what's that o represents

  • @aswinthviswakumar64
    @aswinthviswakumar64 Před 3 lety

    Great Video and a Great initiative sir
    from 12:07 if we use same method to calculate dL/dW12^2 it will be the same as dL/dW11^2.
    is this the correct way or am I getting it wrong
    thank you!

  • @vishalgupta3175
    @vishalgupta3175 Před 3 lety

    Hi sir, Sorry to say you that which degree you have completed,you are awesome!

  • @JaySingh-gv8rm
    @JaySingh-gv8rm Před 4 lety

    how can we cumpute dL/dO31 or what is the formula for to find dL/dO31 ?

  • @ThachDo
    @ThachDo Před 4 lety +1

    10:44 you are pointing to w1_11, but why the formula on board is the derivative w.r.t w2_11?

    • @winviki123
      @winviki123 Před 4 lety

      That's correct.
      Even I was wondering the same

  • @Philanthropic-fg8xx
    @Philanthropic-fg8xx Před 4 měsíci

    Then what will be the formula for derivative of loss wrt w12^2 ?