Backpropagation explained | Part 4 - Calculating the gradient

Sdílet
Vložit
  • čas přidán 24. 07. 2024
  • We're now on number 4 in our journey through understanding backpropagation. In our last video, we focused on how we can mathematically express certain facts about the training process. Now we're going to be using these expressions to help us differentiate the loss of the neural network with respect to the weights.
    Recall from our video that covered the intuition for backpropagation, that, for stochastic gradient descent to update the weights of the network, it first needs to calculate the gradient of the loss with respect to these weights. And calculating this gradient, is exactly what we'll be focusing on in this video.
    We're first going to start out by checking out the equation that backprop uses to differentiate the loss with respect to weights in the network. We'll see that this equation is made up of multiple terms, so next we'll break down and focus on each of these terms individually. Lastly, we'll take the results from each term and combine them to obtain the final result, which will be the gradient of the loss function.
    🕒🦎 VIDEO SECTIONS 🦎🕒
    00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources
    00:58 Agenda
    01:28 Derivative Calculations
    05:45 Calculation Breakdown - First term
    07:36 Calculation Breakdown - Second term
    08:52 Calculation Breakdown - Third term
    11:56 Summary
    13:56 Collective Intelligence and the DEEPLIZARD HIVEMIND
    💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥
    👋 Hey, we're Chris and Mandy, the creators of deeplizard!
    👉 Check out the website for more learning material:
    🔗 deeplizard.com
    💻 ENROLL TO GET DOWNLOAD ACCESS TO CODE FILES
    🔗 deeplizard.com/resources
    🧠 Support collective intelligence, join the deeplizard hivemind:
    🔗 deeplizard.com/hivemind
    🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order
    👉 Use your receipt from Neurohacker to get a discount on deeplizard courses
    🔗 neurohacker.com/shop?rfsn=648...
    👀 CHECK OUT OUR VLOG:
    🔗 / deeplizardvlog
    ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind:
    Tammy
    Mano Prime
    Ling Li
    🚀 Boost collective intelligence by sharing this video on social media!
    👀 Follow deeplizard:
    Our vlog: / deeplizardvlog
    Facebook: / deeplizard
    Instagram: / deeplizard
    Twitter: / deeplizard
    Patreon: / deeplizard
    CZcams: / deeplizard
    🎓 Deep Learning with deeplizard:
    Deep Learning Dictionary - deeplizard.com/course/ddcpailzrd
    Deep Learning Fundamentals - deeplizard.com/course/dlcpailzrd
    Learn TensorFlow - deeplizard.com/course/tfcpailzrd
    Learn PyTorch - deeplizard.com/course/ptcpailzrd
    Natural Language Processing - deeplizard.com/course/txtcpai...
    Reinforcement Learning - deeplizard.com/course/rlcpailzrd
    Generative Adversarial Networks - deeplizard.com/course/gacpailzrd
    🎓 Other Courses:
    DL Fundamentals Classic - deeplizard.com/learn/video/gZ...
    Deep Learning Deployment - deeplizard.com/learn/video/SI...
    Data Science - deeplizard.com/learn/video/d1...
    Trading - deeplizard.com/learn/video/Zp...
    🛒 Check out products deeplizard recommends on Amazon:
    🔗 amazon.com/shop/deeplizard
    🎵 deeplizard uses music by Kevin MacLeod
    🔗 / @incompetech_kmac
    ❤️ Please use the knowledge gained from deeplizard content for good, not evil.

Komentáře • 133

  • @deeplizard
    @deeplizard  Před 6 lety +11

    Backpropagation explained | Part 1 - The intuition
    czcams.com/video/XE3krf3CQls/video.html
    Backpropagation explained | Part 2 - The mathematical notation
    czcams.com/video/2mSysRx-1c0/video.html
    Backpropagation explained | Part 3 - Mathematical observations
    czcams.com/video/G5b4jRBKNxw/video.html
    Backpropagation explained | Part 4 - Calculating the gradient
    czcams.com/video/Zr5viAZGndE/video.html
    Backpropagation explained | Part 5 - What puts the “back” in backprop?
    czcams.com/video/xClK__CqZnQ/video.html
    Machine Learning / Deep Learning Fundamentals playlist: czcams.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
    Keras Machine Learning / Deep Learning Tutorial playlist: czcams.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html

    • @you_dont_know_me_sir
      @you_dont_know_me_sir Před 5 lety

      deeplizard That Scientific Notebook, from which you are explaining in multiple videos, seems great as a collection of notes to review when needed. Have you put it somewhere? GitHub?

    • @sampro454
      @sampro454 Před 4 lety

      5:35 shouldn't there be a da/dg in there?

    • @sampro454
      @sampro454 Před 4 lety

      Nevermind, you involve g when differentiating a with respect to z later very nice

  • @abhishek-shrm
    @abhishek-shrm Před 3 lety +8

    So much professionalism in a youtube video course which is free. Thank you for making these videos.

  • @naughtrussel5787
    @naughtrussel5787 Před 4 lety +33

    I've been stuck with backpropagation for several days. I've tried a bunch of resources but you are the only one whose way of explaining this is exactly what I needed. Thanks a lot for doing this "unfolded" calculations and emphasizing _the purpose_ of doing this or that thing. This is what often missing in ANN courses. Great job!

  • @Sikuq
    @Sikuq Před 2 lety +4

    Although a complex issue, this presentation makes it much easier to understand. Thanks deeplizard.

  • @MalTimeTV
    @MalTimeTV Před rokem +3

    These videos of yours on the mathematics of back-propagation are just incredible. Thank you very much.

  • @bildadatsegha6923
    @bildadatsegha6923 Před rokem +3

    Awesome. I am learning Deep learning as a complete novice and you have been truly helpful. Thanks.
    I really love the simplicity of your lectures.

  • @edbshllanss
    @edbshllanss Před 3 lety +2

    A year ago, I started learning deep learning firstly by your contents. I acquired some intuition, but this series as well as the Keras series went over the top of my head because I was totally unfamiliar with programming and mathematics. But a year after, with basic knowledge of python and mathematics like calculus, it becomes much easier to follow the thread of your videos, and I feel I am finally standing at the starting point of machine learning. Thank you so much for your straightforward explanations!!!

  • @vaibhavkhobragade9773
    @vaibhavkhobragade9773 Před 2 lety +3

    Clear, concise, and perfect understanding. Thank you mandy!

  • @chintanmathia
    @chintanmathia Před 4 lety +4

    This is the epitome of explaining such difficult topic with such simplicity.
    thanks a lot. . .
    I could not stop going through all 36 videos in one go. . . Amazing job mam.

  • @pawarranger
    @pawarranger Před 5 lety +11

    this is now my favourite ann playlist, thanks a ton!

  • @BabisPlaysGuitar
    @BabisPlaysGuitar Před 5 lety +7

    Awesome! All the advanced calculus and linear algebra classes that I took back in engineering school make sense now. Thank you very much!

  • @rabirajbanerjee3872
    @rabirajbanerjee3872 Před 4 lety +7

    After watching your video I could actually do the derivation all by myself, thanks for the intuition :)

  • @antoinetoussaint483
    @antoinetoussaint483 Před 4 lety +6

    Clear, precise, consistent. What a channel, thx.

  • @fatihandyildiz
    @fatihandyildiz Před 3 lety +1

    Just wow! Normally, it's really hard for me to fully understand these derivations (even after watcing multiple times), but you just made it happen in 1.5x speed. Thank you for offering this high quality tutorial for free. Blessings to you.

  • @MJ2403
    @MJ2403 Před 4 lety +3

    You are a gem.....able to understand backpropagation for which i was struggling like anything.

  • @shakyasarkar7143
    @shakyasarkar7143 Před 4 lety +1

    You are legend, Mam!!!
    Truly!!
    I have been searching for this backprop total calculus portion derivation throughout all the youtube videos until i came upon your videos...even i looked upon some udemy courses. Nobody, I REPEAT NOBODY has or even dared to explain this total derivation.
    Thank you, Mam!
    I owe you a lot!

  • @nikosips
    @nikosips Před 3 lety +2

    Thank you for those videos! Your explanations are crisp and clear, and very helpful! You deserve many more subs!

  • @SafeBuster80
    @SafeBuster80 Před 4 lety +3

    Thank you for your videos of backpropagation, I now understand this subject as you explained it nicely and clearly (unlike my uni professor).

  • @TheMaidenReturns
    @TheMaidenReturns Před 4 lety +1

    Wow.. just wow. I have been really struggling this year in uni with my A.I. module, as the teacher doesn't really explain things well. I really can't believe how simple and easy to understand you can make this topic. This series has saved me from failing a module this year, and it helped me learn so much about deep learning. Amazing content, well explained. Big up for this

  • @moizahmed8987
    @moizahmed8987 Před 4 lety +2

    Terrific video, thank you very much
    this is the first video that goes through backprop step by step

  • @bobhut8613
    @bobhut8613 Před 4 lety +1

    Thank you so much for this! I had been stuck trying to wrap my head around the maths for days and your videos really helped.

  • @joshuayudice8245
    @joshuayudice8245 Před 4 lety

    Seriously, you are a godsend. Thank you for creating these clear and methodical videos.

  • @EliorBY
    @EliorBY Před 3 lety +2

    wow. what an amazing illustrative mathematical explanation. just what I was looking for. thank you very much deep lizard!

  • @weactweimpactcharityassoci3964

    this is now my favorite ANN playlist, thanks a ton!
    شكرا

  • @satnav1377
    @satnav1377 Před 5 lety +2

    Incredibly clear explanation, great vid once again!

  • @trankhanhang8151
    @trankhanhang8151 Před 3 lety +1

    So simple and elegant, I wish I found you sooner.

  • @Luis-fh8cv
    @Luis-fh8cv Před 6 lety +1

    Thank you deeplizard, this is very helpful. I can code backpropagation just fine for anns that use the sigmoid function and MSE, but I've always struggled to follow the gradient descent and backprop math

  • @databridgeconsultants9163

    Thank You So much Guys . This series is just the BEST ever made . Its a legendary work done by you guys . I have read so many books . Even my prof was not able to make us understand how these things actually work step by step . All i understood in past was to ditch this portion of Neural networks . But I now I can confidently explain whats the matter inside a neural network . I have subscribed to the paid version of yours .

    • @deeplizard
      @deeplizard  Před 4 lety

      That's great to hear! Really happy that you gained new knowledge. Thank you for letting us know :)
      By the "paid version," are you referring to becoming a member of the deeplizard hivemind via Patreon?

  • @lancelotdsouza4705
    @lancelotdsouza4705 Před 2 lety +1

    Thanks so much you made backpropagation a cakewalk

  • @minhdang5132
    @minhdang5132 Před 4 lety +1

    Brilliant Explanation. Thanks alot!

  • @georgezhou1287
    @georgezhou1287 Před 4 lety +1

    Your work is a godsend. Thank you.

  • @AnandaKevin28
    @AnandaKevin28 Před 3 lety +2

    Just. Great. Explanation. Words are not enough to express it. Thanks a lot for the explanation! 😁

  • @EinsiJo
    @EinsiJo Před 4 lety +1

    Extremely useful! Thank you!

  • @durgamanoja8179
    @durgamanoja8179 Před 6 lety +1

    i have gone through your series , i must say you are AWESOME!! . i could not understand the mathematics behind back propagation in any websites or videos ,you made it very clear. Thanks a lot. Please do make such videos .

    • @deeplizard
      @deeplizard  Před 6 lety

      Thank you, durga! I'm so happy to hear this 😄
      If you're also interested in implementing the neural network concepts from this Deep Learning Fundamentals series in code, check out both our Keras and TensorFlow.js series!
      Keras: czcams.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
      TensorFlow.js: czcams.com/play/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ-.html

  • @gbyu7638
    @gbyu7638 Před 3 lety +1

    So clear explanation and calculation!

  • @rik43
    @rik43 Před 3 lety +1

    Finally I got it, thank you!

  • @aruchan9890
    @aruchan9890 Před 8 měsíci +1

    thank u so much for this, really helpful!

  • @danielrodriguezgonzalez2982

    Cant' say it enough, the best!

  • @todianmishtaku6249
    @todianmishtaku6249 Před 4 lety +1

    Excellent!!!

  • @luisluiscunha
    @luisluiscunha Před 10 měsíci

    Very much appreciated. Very nice explanation.

  • @hairuiliu3446
    @hairuiliu3446 Před 5 lety +1

    very well explained, thanks

  • @samaryadav7208
    @samaryadav7208 Před 6 lety +2

    Great video. Waiting for the next part.

  • @andonglin8900
    @andonglin8900 Před 6 lety +2

    Easy to follow. Thanks a lot!

    • @deeplizard
      @deeplizard  Před 6 lety +1

      I'm glad you think so, Andong! And you're welcome!

  • @richarda1630
    @richarda1630 Před 3 lety +1

    everyone else has said it all :) Thanks so much!

    • @richarda1630
      @richarda1630 Před 3 lety

      to bolster my newbie mind :P I watched 3B1B also to help me understand what you discussed here :) czcams.com/video/Ilg3gGewQ5U/video.html

  • @freedmoresidume
    @freedmoresidume Před 2 lety

    You are the best ❤️

  • @haadialiaqat4590
    @haadialiaqat4590 Před 2 lety +1

    Thank you so much for such a nice explanation. Please make more vedios.

  • @Jxordan
    @Jxordan Před 6 lety +1

    Thank you! Dedicating my midterm today to you.
    Also just a random tip, if you don't use cortana you can right click the "type here to search" and hide it

    • @deeplizard
      @deeplizard  Před 6 lety +1

      Thanks for the tip! How did your midterm go?

  • @khawarshahzad5721
    @khawarshahzad5721 Před 6 lety +1

    Hello deeplizard,
    great video!
    can you please explain how would the partial derivative of loss calculation be done for batch size greater than 1?
    thanks.

    • @deeplizard
      @deeplizard  Před 6 lety

      Hey Khawar - Thanks!
      To summarize, you take the gradient of the loss with respect to a particular weight for _each_ input. You then average the resulting gradients and update the given weight with that average. This would be the case if you passed all the data to your network at once. If instead you were doing batch gradient descent, where you were passing mini-batches of data to your network at a time, then you would apply this same method to _each batch_ of data, rather than to all the data at once.

  • @chickensalad1369
    @chickensalad1369 Před 4 lety

    High schooler here, only armed with calculus of add maths level, has entered to boss room. Had to spend almost 3 hours just on the entire back propagation process, filling up holes in my mathematics on the way such as partial derivatives with other online math tutorials. it was hard but worth it. These will be my final words as my brain liquifies and escape form my ears, bye.....

  • @timxu1766
    @timxu1766 Před 4 lety +1

    thank you deeplizard!!!

  • @williamdowling
    @williamdowling Před 5 lety +2

    Awesome videos, thanks! I am curious though what happens if a g function is not differentiable? I guess that is common, too, for example g(x) = {0 if x

  • @justchill99902
    @justchill99902 Před 5 lety +1

    Hey there! The daunting Backprop Math proof went through as smooth as butter after watching these 5 videos. Thank you so much.
    Sometimes I think the book itself is speaking lol. I think that one dislike is by mistake.
    Question - At 8:36 , you talk about why we get the derivative as g prime. Could you please explain what does it mean? and it's relation to a sub 1?
    PS: You are my most favourite CZcams channel. You earned it. I think your content is definitely making a difference in the world. :) Please carry on!

    • @deeplizard
      @deeplizard  Před 5 lety +2

      Hey Nirbhay - Apologies for the delayed response! Somehow this comment was tucked away, and I just came across it.
      Thank you so much for your kind remarks! Really glad to hear you're enjoying and learning from the content.
      For your question from 8:36:
      (I'll eliminate the superscripts and subscripts from my explanation below.)
      Our objective is to differentiate a with respect to z.
      Recall that a = g(z) by definition.
      Taking the derivative of a with respect to z means that we need to take the derivative of g(z) with respect to z.
      Since g is a function of z, this gives us g'(z) as the derivative.
      Let me know if this helps.

  • @jdmik
    @jdmik Před 6 lety +1

    Thanks for the great videos! Just wondering if you were planning on doing a video on how backprop is applied in convnets?

    • @deeplizard
      @deeplizard  Před 6 lety

      Hey Johan - Glad you're liking the videos! I currently don't have this as an immediate topic to cover, but I will add it to my list to explore further as a possible future video.

  • @krishnaik06
    @krishnaik06 Před 6 lety +1

    Nice video. Can you please explain by using python code instead of using Keras only the backpropogation part. I have written the feed forward propogation part but was not able to write the code of back propogation. Please help

    • @deeplizard
      @deeplizard  Před 6 lety +1

      Thanks, Krish! I don't have any code I've written myself that implements the backprop math that I've illustrated in the set of backprop videos. When I searched online for backpropagation in Python though, I saw some open sourced resources that you might be able to check out to assist with your implementation. This is one of the top results that came back from my query: machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/

  • @umair369
    @umair369 Před 4 lety

    Thanks this was very elaborate and thoroughly explained, however I was wondering how you would average the derivative with respect to one particular weight across ALL training examples. 11:40 is when you mention that. I am assuming the change in w_1_2 doesn't affect the loss for any other training example, is that true? Please let me know.

  • @Arjun-kt4by
    @Arjun-kt4by Před 4 lety

    hello at 11:02 how did the derivative came out to be a2(L-1)? are you considering a as a constand?

  • @r.balamurali8246
    @r.balamurali8246 Před 4 lety

    Thank you very much.
    @12.22 does 'n' represents the number of training samples or number of nodes in layer L. could you please explain this.

  • @s25412
    @s25412 Před 3 lety

    11:46 Given a single weight, wouldn't there be multiple versions of that weight that correspond to each training sample? If so, on the right hand side where you sum over i, shouldn't w_12 be indexed with 'i' just like how you did it for C?

  • @harishh.s4701
    @harishh.s4701 Před 2 lety +1

    Hello,
    Thank you for sharing your knowledge with us. I really appreciate the effort put into these videos. This series on backpropagation clarified a lot of confusion and helped me to understand it more clearly. The explanation was clear and easy to follow. However, I have one small suggestion. In this video at the timestamp 11.53, the term 'n' is used to represent the number of training samples whereas in all the previous equations 'n' represents the number of neurons in a particular layer (Please correct me if I am wrong). Perhaps it would be better if you could use a different notation (like N) for the number of training samples to avoid confusion. Maybe it has already been updated. I apologize if this is a repetition. Otherwise, great work, Keep it up, and thanks a lot :)

  • @hussainbhavnagarwala2596
    @hussainbhavnagarwala2596 Před 10 měsíci

    can you show the same example for a weight that is a few layers behind the output layer, I am not able to understand how we will sum the activation of each layer

  • @thomasvinet6160
    @thomasvinet6160 Před 6 lety +2

    Great video, just have a question: if we want to calculate the derivative or the weight of the layer (L-2), it will be the same as for layer L-1, but by changing a with g(z) and so ? Thanks
    EDIT: didn't see the next video ;)
    Those tutorials are very understandable, keep doing them !!

    • @deeplizard
      @deeplizard  Před 6 lety

      Thanks, Thomas! So were you able to answer your question after watching the next video?

    • @money_wins_controls
      @money_wins_controls Před 5 lety

      guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation

  • @wilmobaggins
    @wilmobaggins Před 4 lety +3

    It might have been easier to follow the notation if you had shown from a node 2 to say node 3. The 1 looks a lot like an l to my aging eyes :)
    Thank you for the video very helpful.

  • @RohitPrasad2512
    @RohitPrasad2512 Před 5 lety

    Can u add what happens if the weight is between two hidden layers? and also how to calculate loss for that.

  • @tymothylim6550
    @tymothylim6550 Před 3 lety +1

    Thank you very much for this video! May I ask if the training sample refers to the "batch" in a given epoch? Thus, the average gradients calculated across all batches would be used for SGD?
    Thanks you also for going through the mathematics step-by-step! It really helps to have someone go through the math, instead of just reading it on my own!

    • @deeplizard
      @deeplizard  Před 3 lety

      You're welcome Tymothy! Happy that you're enjoying the course.
      In this explanation, a sample refers to a single sample. However, most of the time, neural network APISs will calculate the gradients and do a weight update per-batch. The per-batch update is referred to as "mini-batch gradient descent." I give a little note about it in the section of the blog below titled "Mini-Batch Gradient Descent":
      deeplizard.com/learn/video/U4WB9p6ODjM

  • @ericsonbitoon
    @ericsonbitoon Před 3 lety

    Hi Mandy, do you have a good reference book to recommend?

  • @John-wx3zn
    @John-wx3zn Před 3 měsíci

    Hi Mandy, how does the weight in L-1 connect to layer L? Doesn't L have its own weight?

  • @Loev06
    @Loev06 Před 3 lety

    Amazing video! I know you don't use biases in this series, but do you know the derivative of the cost function w.r.t. the biases?
    Edit: I think I found it, is it (dC0 / da(L)1) (da(L)1 / dZ(L)1) = 2(a(L)1 - y1)( g'(L) ( Z(L)1 ) )? (Basically the first two terms, because the third term is always equal to 1)

  • @aamir122a
    @aamir122a Před 6 lety +1

    In the future, you might look at doing videos on neural networks for reinforcements learning approximating value function and policy function.

    • @deeplizard
      @deeplizard  Před 6 lety

      Thanks for the suggestion, Aamir!

    • @money_wins_controls
      @money_wins_controls Před 5 lety

      guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation

  • @John-wx3zn
    @John-wx3zn Před 3 měsíci

    Hi Mandy, since the output of a single node comes from the relu function, why isn't this output, a, written on the side of the arrow instead of the weight, w, when going from L-1 to L?

  • @Tntpker
    @Tntpker Před 5 lety

    After thinking about it a bit, why is the expression @ 12:18 used where you sum all the partials of the cost function w.r.t. w12 _for all training examples_ and calculate an average partial derivative? I thought one would do this for batch gradient descent but not with stochastic gradient descent? Or am I seeing something completely wrong here?

    • @deeplizard
      @deeplizard  Před 5 lety

      Hey Tntpker - Yes, when _n_ is the number of samples in the entire training set, this is the case for _batch_ gradient descent. Also, if using _mini-batch_ gradient descent, which is normally what is done with most neural network APIs by default, then you could look at _n_ as being the number of training examples within a single batch, rather than the entire training set. With this, the gradient update would occur on a per-batch basis.

    • @Tntpker
      @Tntpker Před 5 lety +1

      @@deeplizard Cheers!

    • @money_wins_controls
      @money_wins_controls Před 5 lety

      guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation

  • @yeahorightbro
    @yeahorightbro Před 6 lety +5

    Great video! How did you learn this stuff yourself?

    • @deeplizard
      @deeplizard  Před 6 lety +14

      Thanks, Daniel!
      For deep learning in general, I took a self-lead study approach through a combination of online resources and exploring/building networks for my own use in personal projects. The online resources that I took the most away from were Jeremy Howard’s and Rachel Thomas’ fast.ai course, parts of Andrew Ng’s Deep Learning and Machine Learning courses on Coursera, and Michael Nielson’s Neural Networks and Deep Learning book.
      In regards to the math- One of my degrees is in math, so that’s where that came from. :) Learning the math specific to neural networks was just a matter of applying the math that I already had experience with.

    • @zeus1082
      @zeus1082 Před 6 lety +1

      deeplizard Iam doing the same what u did.but i was not a math graduate but I had more interest in maths.So its just easy for learning these concepts.

    • @deeplizard
      @deeplizard  Před 6 lety +1

      Hey aneesh - That's cool! Thanks for sharing. Are you also following the same online resources I mentioned?

    • @zeus1082
      @zeus1082 Před 6 lety +1

      deeplizard no except fast.ai Iam following andrew ng videos ,some online resources and udemy.Your tutorials are usefull too.Like you said my interest in math made these concepts easy .Keep posting videos like this.

  • @nourelislam8565
    @nourelislam8565 Před 5 lety

    Amazing explanation...... but I just want to know what is the purpose of having the average of the loss function for a certain weight for n training examples ??? ....I guess all we have to know is the change of the loss fun throughout the training examples ??

    • @deeplizard
      @deeplizard  Před 5 lety +1

      Hey Nour - It's because we want to know the average loss across all samples. This will tell us how our model performs on average across the entire data set.

  • @AndreaCatania
    @AndreaCatania Před 5 lety

    Sorry if this question is stupid, but I don't understand exactly what mean the loss. Knowing the loss of weight W12 how can I update the related weight?
    W12 += LOSS12 seems not correct to me

  • @money_wins_controls
    @money_wins_controls Před 5 lety

    guys please help
    @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation

    • @ssffyy
      @ssffyy Před 3 lety

      Hi sid,
      This response is bit late as I just read your comment....I guess your confusion comes from the fact that you are considering g multiplied with z, where in fact its not a multiplication rather ... g is a function of z --> g(z) .... like f(x)... so when you take derivative of g(z) with respect to z, you end up getting g'(z)...hope this cleared any doubts.

  • @ajaymalik9147
    @ajaymalik9147 Před 5 lety +2

    nice

  • @MaahirGupta
    @MaahirGupta Před 3 lety

    You win.

  • @Nissearne12
    @Nissearne12 Před 4 lety

    Ahh.. at 7:07 explain my wonder how it's ever possible to use the Total Loss value for back propagation. The Total loss value have only Absolute value (because of the square operation), I was wounded how that Total Loss value could ever be used to know what direction each weight knobs should be turned (+ or -), it could not, the sign information is lost in the total Loss calculation!. But it it turns out that the sign of the Error come back into the equations again when looking at individual Losses d/da1(L) = 2(Actual Value - Target).

  • @PritishMishra
    @PritishMishra Před 3 lety +1

    0:00 - Introduction
    1:01 - Precap of the Video
    1:24 - Derivative of the Loss with respect to weights (Calculations)
    11:56 - Conclusion

    • @ramiro6322
      @ramiro6322 Před 3 lety +1

      I would also add
      5:45 First term (Loss with respect to Activation Output)
      7:36 Second term (Activation Output with respect to Input)
      8:52 Third term (Input with respect to weight)
      11:30 Putting it all together

    • @deeplizard
      @deeplizard  Před 3 lety

      Thank you both! Your timestamps have been added to the video description :D

    • @PritishMishra
      @PritishMishra Před 3 lety

      @@deeplizard Thanks

  • @JordanMetroidManiac
    @JordanMetroidManiac Před 4 lety

    How does bias fit into all of this?

    • @deeplizard
      @deeplizard  Před 4 lety

      Bias terms are updated in the same way as the weights. I elaborate more on this on the upcoming episode dedicated to bias: deeplizard.com/learn/video/HetFihsXSys

  • @evertonsantosdeandradejuni3787

    I feel like I can Implement this myself with c++, is this normal?

  • @ashutoshshah864
    @ashutoshshah864 Před 3 lety +1

    🙏🏽💪🏽🤙🏽

  • @FelidInPetasus
    @FelidInPetasus Před 4 lety

    Here's a thing that's unclear to me: You say that this process (which you do describe very neatly) can be applied to any weight in the network. However, shortly after 7:00, you conclude that the first term contains y_1. In video #2, you define this y_j as "the desired value of node j in the output layer L for a single training sample", i.e., the value a specific output neuron "ought to be". This works fine if you're looking at the weights connecting L-1 to L (the output layer), but doesn't make sense for the weights connecting, say, L-2 to L-1.
    What value would I use for y_j in a case like that?
    Edit: Thinking about it now: Am I correct in assuming that the first and second terms for the example you provided stay the same (even when looking at previous layers) and it's only the third term (specifically its weighted sum) that is "split up" into even more terms? This would remove the need to use a different y_j for other layers.
    Other than that: thank you for your videos

  • @WahranRai
    @WahranRai Před 3 lety

    12:44 to avoid the 2 in the equation of the gradient : minimizing 0.5*C_0 is same that minimizing C_0...: take 0.5*C_0 as loss function and when taking the derivative the 2 disappear.

  • @matharbarghi
    @matharbarghi Před 4 lety

    The partial derivative of loss function should be taken w.r.t weights of last layer in the network. But you mentioned that we should take derivative of loss function with respect to all weights of the network. Please correct me if I am wrong, otherwise correct it in your course. Thanks

    • @deeplizard
      @deeplizard  Před 4 lety

      You take the derivative of the loss function with respect to each weight. You then use each respective gradient to update each individual weight. For example, take the derivative of the loss with respect to weight w1. With the resulting gradient, update w1 to a new value. Do the same for w2, w3, etc...

  • @sinaasadi3800
    @sinaasadi3800 Před 5 lety

    Hi. Would you please answer my other comment ? I posted it yesterday under another video from this play list. And also thanks a lot for your videos.

  • @abubakarali6399
    @abubakarali6399 Před 3 lety

    What degree you have and from which university?

  • @srijalshrestha7380
    @srijalshrestha7380 Před 6 lety

    Thanks a lot, don't know when and how i will use in future but i understood it very well. Thank you.

    • @deeplizard
      @deeplizard  Před 6 lety

      You're welcome, Srijal! I'm glad you were able to gain an understanding!

    • @money_wins_controls
      @money_wins_controls Před 5 lety

      guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation

  • @aryanrahman3212
    @aryanrahman3212 Před 2 lety

    When she says g-Prime, what she means is the derivative(or differentiation) of the activation function-g. This function can be anything literally.

  • @jorgecelis8459
    @jorgecelis8459 Před 3 lety

    Only detail is that the number of nodes should be indexed for the general case and then maybe use another letter for the number of examples =)

  • @user-sv1ew5ct5w
    @user-sv1ew5ct5w Před 5 lety

    I feel sentdex style

  • @jeetenzhurlollz8387
    @jeetenzhurlollz8387 Před 4 lety

    far better than deeplearning.ai

  • @patrickryckman3867
    @patrickryckman3867 Před 4 lety

    8:22 you lost me. You said we just put this into the right side of the equation, but thats not the only thing you put into the right side of the equation.

  • @mechhyena6957
    @mechhyena6957 Před 4 lety

    i have no clue what is going on in this video...

  • @JordanMetroidManiac
    @JordanMetroidManiac Před 4 lety +1

    Thicc

  • @kiarash7604
    @kiarash7604 Před 4 lety

    most of these videos are explaining the obvious