Backpropagation explained | Part 4 - Calculating the gradient

deeplizard

zhlédnutí 47 759

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 24. 07. 2024
We're now on number 4 in our journey through understanding backpropagation. In our last video, we focused on how we can mathematically express certain facts about the training process. Now we're going to be using these expressions to help us differentiate the loss of the neural network with respect to the weights.
Recall from our video that covered the intuition for backpropagation, that, for stochastic gradient descent to update the weights of the network, it first needs to calculate the gradient of the loss with respect to these weights. And calculating this gradient, is exactly what we'll be focusing on in this video.
We're first going to start out by checking out the equation that backprop uses to differentiate the loss with respect to weights in the network. We'll see that this equation is made up of multiple terms, so next we'll break down and focus on each of these terms individually. Lastly, we'll take the results from each term and combine them to obtain the final result, which will be the gradient of the loss function.
🕒🦎 VIDEO SECTIONS 🦎🕒
00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources
00:58 Agenda
01:28 Derivative Calculations
05:45 Calculation Breakdown - First term
07:36 Calculation Breakdown - Second term
08:52 Calculation Breakdown - Third term
11:56 Summary
13:56 Collective Intelligence and the DEEPLIZARD HIVEMIND
💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥
👋 Hey, we're Chris and Mandy, the creators of deeplizard!
👉 Check out the website for more learning material:
🔗 deeplizard.com
💻 ENROLL TO GET DOWNLOAD ACCESS TO CODE FILES
🔗 deeplizard.com/resources
🧠 Support collective intelligence, join the deeplizard hivemind:
🔗 deeplizard.com/hivemind
🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order
👉 Use your receipt from Neurohacker to get a discount on deeplizard courses
🔗 neurohacker.com/shop?rfsn=648...
👀 CHECK OUT OUR VLOG:
🔗 / deeplizardvlog
❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind:
Tammy
Mano Prime
Ling Li
🚀 Boost collective intelligence by sharing this video on social media!
👀 Follow deeplizard:
Our vlog: / deeplizardvlog
Facebook: / deeplizard
Instagram: / deeplizard
Twitter: / deeplizard
Patreon: / deeplizard
CZcams: / deeplizard
🎓 Deep Learning with deeplizard:
Deep Learning Dictionary - deeplizard.com/course/ddcpailzrd
Deep Learning Fundamentals - deeplizard.com/course/dlcpailzrd
Learn TensorFlow - deeplizard.com/course/tfcpailzrd
Learn PyTorch - deeplizard.com/course/ptcpailzrd
Natural Language Processing - deeplizard.com/course/txtcpai...
Reinforcement Learning - deeplizard.com/course/rlcpailzrd
Generative Adversarial Networks - deeplizard.com/course/gacpailzrd
🎓 Other Courses:
DL Fundamentals Classic - deeplizard.com/learn/video/gZ...
Deep Learning Deployment - deeplizard.com/learn/video/SI...
Data Science - deeplizard.com/learn/video/d1...
Trading - deeplizard.com/learn/video/Zp...
🛒 Check out products deeplizard recommends on Amazon:
🔗 amazon.com/shop/deeplizard
🎵 deeplizard uses music by Kevin MacLeod
🔗 / @incompetech_kmac
❤️ Please use the knowledge gained from deeplizard content for good, not evil.

Komentáře • 133

@deeplizard Před 6 lety ⁺¹¹
Backpropagation explained | Part 1 - The intuition
czcams.com/video/XE3krf3CQls/video.html
Backpropagation explained | Part 2 - The mathematical notation
czcams.com/video/2mSysRx-1c0/video.html
Backpropagation explained | Part 3 - Mathematical observations
czcams.com/video/G5b4jRBKNxw/video.html
Backpropagation explained | Part 4 - Calculating the gradient
czcams.com/video/Zr5viAZGndE/video.html
Backpropagation explained | Part 5 - What puts the “back” in backprop?
czcams.com/video/xClK__CqZnQ/video.html
Machine Learning / Deep Learning Fundamentals playlist: czcams.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
Keras Machine Learning / Deep Learning Tutorial playlist: czcams.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
@you_dont_know_me_sir Před 5 lety
deeplizard That Scientific Notebook, from which you are explaining in multiple videos, seems great as a collection of notes to review when needed. Have you put it somewhere? GitHub?
@sampro454 Před 4 lety
5:35 shouldn't there be a da/dg in there?
@sampro454 Před 4 lety
Nevermind, you involve g when differentiating a with respect to z later very nice
@abhishek-shrm Před 3 lety ⁺⁸
So much professionalism in a youtube video course which is free. Thank you for making these videos.
@naughtrussel5787 Před 4 lety ⁺³³
I've been stuck with backpropagation for several days. I've tried a bunch of resources but you are the only one whose way of explaining this is exactly what I needed. Thanks a lot for doing this "unfolded" calculations and emphasizing _the purpose_ of doing this or that thing. This is what often missing in ANN courses. Great job!
@CreeperSlenderman Před 3 lety ⁺¹
Imagine me
I've been for months
@Sikuq Před 2 lety ⁺⁴
Although a complex issue, this presentation makes it much easier to understand. Thanks deeplizard.
@MalTimeTV Před rokem ⁺³
These videos of yours on the mathematics of back-propagation are just incredible. Thank you very much.
@bildadatsegha6923 Před rokem ⁺³
Awesome. I am learning Deep learning as a complete novice and you have been truly helpful. Thanks.
I really love the simplicity of your lectures.
@edbshllanss Před 3 lety ⁺²
A year ago, I started learning deep learning firstly by your contents. I acquired some intuition, but this series as well as the Keras series went over the top of my head because I was totally unfamiliar with programming and mathematics. But a year after, with basic knowledge of python and mathematics like calculus, it becomes much easier to follow the thread of your videos, and I feel I am finally standing at the starting point of machine learning. Thank you so much for your straightforward explanations!!!
@vaibhavkhobragade9773 Před 2 lety ⁺³
Clear, concise, and perfect understanding. Thank you mandy!
@chintanmathia Před 4 lety ⁺⁴
This is the epitome of explaining such difficult topic with such simplicity.
thanks a lot. . .
I could not stop going through all 36 videos in one go. . . Amazing job mam.
@pawarranger Před 5 lety ⁺¹¹
this is now my favourite ann playlist, thanks a ton!
@BabisPlaysGuitar Před 5 lety ⁺⁷
Awesome! All the advanced calculus and linear algebra classes that I took back in engineering school make sense now. Thank you very much!
@rabirajbanerjee3872 Před 4 lety ⁺⁷
After watching your video I could actually do the derivation all by myself, thanks for the intuition :)
@antoinetoussaint483 Před 4 lety ⁺⁶
Clear, precise, consistent. What a channel, thx.
@fatihandyildiz Před 3 lety ⁺¹
Just wow! Normally, it's really hard for me to fully understand these derivations (even after watcing multiple times), but you just made it happen in 1.5x speed. Thank you for offering this high quality tutorial for free. Blessings to you.
@MJ2403 Před 4 lety ⁺³
You are a gem.....able to understand backpropagation for which i was struggling like anything.
@shakyasarkar7143 Před 4 lety ⁺¹
You are legend, Mam!!!
Truly!!
I have been searching for this backprop total calculus portion derivation throughout all the youtube videos until i came upon your videos...even i looked upon some udemy courses. Nobody, I REPEAT NOBODY has or even dared to explain this total derivation.
Thank you, Mam!
I owe you a lot!
@nikosips Před 3 lety ⁺²
Thank you for those videos! Your explanations are crisp and clear, and very helpful! You deserve many more subs!
@SafeBuster80 Před 4 lety ⁺³
Thank you for your videos of backpropagation, I now understand this subject as you explained it nicely and clearly (unlike my uni professor).
@TheMaidenReturns Před 4 lety ⁺¹
Wow.. just wow. I have been really struggling this year in uni with my A.I. module, as the teacher doesn't really explain things well. I really can't believe how simple and easy to understand you can make this topic. This series has saved me from failing a module this year, and it helped me learn so much about deep learning. Amazing content, well explained. Big up for this
@moizahmed8987 Před 4 lety ⁺²
Terrific video, thank you very much
this is the first video that goes through backprop step by step
@bobhut8613 Před 4 lety ⁺¹
Thank you so much for this! I had been stuck trying to wrap my head around the maths for days and your videos really helped.
@joshuayudice8245 Před 4 lety
Seriously, you are a godsend. Thank you for creating these clear and methodical videos.
@EliorBY Před 3 lety ⁺²
wow. what an amazing illustrative mathematical explanation. just what I was looking for. thank you very much deep lizard!
@weactweimpactcharityassoci3964 Před 3 lety ⁺¹
this is now my favorite ANN playlist, thanks a ton!
شكرا
@satnav1377 Před 5 lety ⁺²
Incredibly clear explanation, great vid once again!
@trankhanhang8151 Před 3 lety ⁺¹
So simple and elegant, I wish I found you sooner.
@Luis-fh8cv Před 6 lety ⁺¹
Thank you deeplizard, this is very helpful. I can code backpropagation just fine for anns that use the sigmoid function and MSE, but I've always struggled to follow the gradient descent and backprop math
@databridgeconsultants9163 Před 4 lety ⁺²
Thank You So much Guys . This series is just the BEST ever made . Its a legendary work done by you guys . I have read so many books . Even my prof was not able to make us understand how these things actually work step by step . All i understood in past was to ditch this portion of Neural networks . But I now I can confidently explain whats the matter inside a neural network . I have subscribed to the paid version of yours .
@deeplizard Před 4 lety
That's great to hear! Really happy that you gained new knowledge. Thank you for letting us know :)
By the "paid version," are you referring to becoming a member of the deeplizard hivemind via Patreon?
@lancelotdsouza4705 Před 2 lety ⁺¹
Thanks so much you made backpropagation a cakewalk
@minhdang5132 Před 4 lety ⁺¹
Brilliant Explanation. Thanks alot!
@georgezhou1287 Před 4 lety ⁺¹
Your work is a godsend. Thank you.
@AnandaKevin28 Před 3 lety ⁺²
Just. Great. Explanation. Words are not enough to express it. Thanks a lot for the explanation! 😁
@EinsiJo Před 4 lety ⁺¹
Extremely useful! Thank you!
@durgamanoja8179 Před 6 lety ⁺¹
i have gone through your series , i must say you are AWESOME!! . i could not understand the mathematics behind back propagation in any websites or videos ,you made it very clear. Thanks a lot. Please do make such videos .
@deeplizard Před 6 lety
Thank you, durga! I'm so happy to hear this 😄
If you're also interested in implementing the neural network concepts from this Deep Learning Fundamentals series in code, check out both our Keras and TensorFlow.js series!
Keras: czcams.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
TensorFlow.js: czcams.com/play/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ-.html
@gbyu7638 Před 3 lety ⁺¹
So clear explanation and calculation!
@rik43 Před 3 lety ⁺¹
Finally I got it, thank you!
@aruchan9890 Před 8 měsíci ⁺¹
thank u so much for this, really helpful!
@danielrodriguezgonzalez2982 Před 4 lety ⁺²
Cant' say it enough, the best!
@todianmishtaku6249 Před 4 lety ⁺¹
Excellent!!!
@luisluiscunha Před 10 měsíci
Very much appreciated. Very nice explanation.
@hairuiliu3446 Před 5 lety ⁺¹
very well explained, thanks
@samaryadav7208 Před 6 lety ⁺²
Great video. Waiting for the next part.
@deeplizard Před 6 lety
Thanks, Samar!
@andonglin8900 Před 6 lety ⁺²
Easy to follow. Thanks a lot!
@deeplizard Před 6 lety ⁺¹
I'm glad you think so, Andong! And you're welcome!
@richarda1630 Před 3 lety ⁺¹
everyone else has said it all :) Thanks so much!
@richarda1630 Před 3 lety
to bolster my newbie mind :P I watched 3B1B also to help me understand what you discussed here :) czcams.com/video/Ilg3gGewQ5U/video.html
@freedmoresidume Před 2 lety
You are the best ❤️
@haadialiaqat4590 Před 2 lety ⁺¹
Thank you so much for such a nice explanation. Please make more vedios.
@Jxordan Před 6 lety ⁺¹
Thank you! Dedicating my midterm today to you.
Also just a random tip, if you don't use cortana you can right click the "type here to search" and hide it
@deeplizard Před 6 lety ⁺¹
Thanks for the tip! How did your midterm go?
@khawarshahzad5721 Před 6 lety ⁺¹
Hello deeplizard,
great video!
can you please explain how would the partial derivative of loss calculation be done for batch size greater than 1?
thanks.
@deeplizard Před 6 lety
Hey Khawar - Thanks!
To summarize, you take the gradient of the loss with respect to a particular weight for _each_ input. You then average the resulting gradients and update the given weight with that average. This would be the case if you passed all the data to your network at once. If instead you were doing batch gradient descent, where you were passing mini-batches of data to your network at a time, then you would apply this same method to _each batch_ of data, rather than to all the data at once.
@chickensalad1369 Před 4 lety
High schooler here, only armed with calculus of add maths level, has entered to boss room. Had to spend almost 3 hours just on the entire back propagation process, filling up holes in my mathematics on the way such as partial derivatives with other online math tutorials. it was hard but worth it. These will be my final words as my brain liquifies and escape form my ears, bye.....
@timxu1766 Před 4 lety ⁺¹
thank you deeplizard!!!
@williamdowling Před 5 lety ⁺²
Awesome videos, thanks! I am curious though what happens if a g function is not differentiable? I guess that is common, too, for example g(x) = {0 if x
@justchill99902 Před 5 lety ⁺¹
Hey there! The daunting Backprop Math proof went through as smooth as butter after watching these 5 videos. Thank you so much.
Sometimes I think the book itself is speaking lol. I think that one dislike is by mistake.
Question - At 8:36 , you talk about why we get the derivative as g prime. Could you please explain what does it mean? and it's relation to a sub 1?
PS: You are my most favourite CZcams channel. You earned it. I think your content is definitely making a difference in the world. :) Please carry on!
@deeplizard Před 5 lety ⁺²
Hey Nirbhay - Apologies for the delayed response! Somehow this comment was tucked away, and I just came across it.
Thank you so much for your kind remarks! Really glad to hear you're enjoying and learning from the content.
For your question from 8:36:
(I'll eliminate the superscripts and subscripts from my explanation below.)
Our objective is to differentiate a with respect to z.
Recall that a = g(z) by definition.
Taking the derivative of a with respect to z means that we need to take the derivative of g(z) with respect to z.
Since g is a function of z, this gives us g'(z) as the derivative.
Let me know if this helps.
@jdmik Před 6 lety ⁺¹
Thanks for the great videos! Just wondering if you were planning on doing a video on how backprop is applied in convnets?
@deeplizard Před 6 lety
Hey Johan - Glad you're liking the videos! I currently don't have this as an immediate topic to cover, but I will add it to my list to explore further as a possible future video.
@krishnaik06 Před 6 lety ⁺¹
Nice video. Can you please explain by using python code instead of using Keras only the backpropogation part. I have written the feed forward propogation part but was not able to write the code of back propogation. Please help
@deeplizard Před 6 lety ⁺¹
Thanks, Krish! I don't have any code I've written myself that implements the backprop math that I've illustrated in the set of backprop videos. When I searched online for backpropagation in Python though, I saw some open sourced resources that you might be able to check out to assist with your implementation. This is one of the top results that came back from my query: machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/
@umair369 Před 4 lety
Thanks this was very elaborate and thoroughly explained, however I was wondering how you would average the derivative with respect to one particular weight across ALL training examples. 11:40 is when you mention that. I am assuming the change in w_1_2 doesn't affect the loss for any other training example, is that true? Please let me know.
@Arjun-kt4by Před 4 lety
hello at 11:02 how did the derivative came out to be a2(L-1)? are you considering a as a constand?
@r.balamurali8246 Před 4 lety
Thank you very much.
@12.22 does 'n' represents the number of training samples or number of nodes in layer L. could you please explain this.
@deeplizard Před 4 lety
number of training samples
@s25412 Před 3 lety
11:46 Given a single weight, wouldn't there be multiple versions of that weight that correspond to each training sample? If so, on the right hand side where you sum over i, shouldn't w_12 be indexed with 'i' just like how you did it for C?
@harishh.s4701 Před 2 lety ⁺¹
Hello,
Thank you for sharing your knowledge with us. I really appreciate the effort put into these videos. This series on backpropagation clarified a lot of confusion and helped me to understand it more clearly. The explanation was clear and easy to follow. However, I have one small suggestion. In this video at the timestamp 11.53, the term 'n' is used to represent the number of training samples whereas in all the previous equations 'n' represents the number of neurons in a particular layer (Please correct me if I am wrong). Perhaps it would be better if you could use a different notation (like N) for the number of training samples to avoid confusion. Maybe it has already been updated. I apologize if this is a repetition. Otherwise, great work, Keep it up, and thanks a lot :)
@hussainbhavnagarwala2596 Před 10 měsíci
can you show the same example for a weight that is a few layers behind the output layer, I am not able to understand how we will sum the activation of each layer
@thomasvinet6160 Před 6 lety ⁺²
Great video, just have a question: if we want to calculate the derivative or the weight of the layer (L-2), it will be the same as for layer L-1, but by changing a with g(z) and so ? Thanks
EDIT: didn't see the next video ;)
Those tutorials are very understandable, keep doing them !!
@deeplizard Před 6 lety
Thanks, Thomas! So were you able to answer your question after watching the next video?
@money_wins_controls Před 5 lety
guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation
@wilmobaggins Před 4 lety ⁺³
It might have been easier to follow the notation if you had shown from a node 2 to say node 3. The 1 looks a lot like an l to my aging eyes :)
Thank you for the video very helpful.
@RohitPrasad2512 Před 5 lety
Can u add what happens if the weight is between two hidden layers? and also how to calculate loss for that.
@tymothylim6550 Před 3 lety ⁺¹
Thank you very much for this video! May I ask if the training sample refers to the "batch" in a given epoch? Thus, the average gradients calculated across all batches would be used for SGD?
Thanks you also for going through the mathematics step-by-step! It really helps to have someone go through the math, instead of just reading it on my own!
@deeplizard Před 3 lety
You're welcome Tymothy! Happy that you're enjoying the course.
In this explanation, a sample refers to a single sample. However, most of the time, neural network APISs will calculate the gradients and do a weight update per-batch. The per-batch update is referred to as "mini-batch gradient descent." I give a little note about it in the section of the blog below titled "Mini-Batch Gradient Descent":
deeplizard.com/learn/video/U4WB9p6ODjM
@ericsonbitoon Před 3 lety
Hi Mandy, do you have a good reference book to recommend?
@John-wx3zn Před 3 měsíci
Hi Mandy, how does the weight in L-1 connect to layer L? Doesn't L have its own weight?
@Loev06 Před 3 lety
Amazing video! I know you don't use biases in this series, but do you know the derivative of the cost function w.r.t. the biases?
Edit: I think I found it, is it (dC0 / da(L)1) (da(L)1 / dZ(L)1) = 2(a(L)1 - y1)( g'(L) ( Z(L)1 ) )? (Basically the first two terms, because the third term is always equal to 1)
@aamir122a Před 6 lety ⁺¹
In the future, you might look at doing videos on neural networks for reinforcements learning approximating value function and policy function.
@deeplizard Před 6 lety
Thanks for the suggestion, Aamir!
@money_wins_controls Před 5 lety
guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation
@John-wx3zn Před 3 měsíci
Hi Mandy, since the output of a single node comes from the relu function, why isn't this output, a, written on the side of the arrow instead of the weight, w, when going from L-1 to L?
@Tntpker Před 5 lety
After thinking about it a bit, why is the expression @ 12:18 used where you sum all the partials of the cost function w.r.t. w12 _for all training examples_ and calculate an average partial derivative? I thought one would do this for batch gradient descent but not with stochastic gradient descent? Or am I seeing something completely wrong here?
@deeplizard Před 5 lety
Hey Tntpker - Yes, when _n_ is the number of samples in the entire training set, this is the case for _batch_ gradient descent. Also, if using _mini-batch_ gradient descent, which is normally what is done with most neural network APIs by default, then you could look at _n_ as being the number of training examples within a single batch, rather than the entire training set. With this, the gradient update would occur on a per-batch basis.
@Tntpker Před 5 lety ⁺¹
@@deeplizard Cheers!
@money_wins_controls Před 5 lety
guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation
@yeahorightbro Před 6 lety ⁺⁵
Great video! How did you learn this stuff yourself?
@deeplizard Před 6 lety ⁺¹⁴
Thanks, Daniel!
For deep learning in general, I took a self-lead study approach through a combination of online resources and exploring/building networks for my own use in personal projects. The online resources that I took the most away from were Jeremy Howard’s and Rachel Thomas’ fast.ai course, parts of Andrew Ng’s Deep Learning and Machine Learning courses on Coursera, and Michael Nielson’s Neural Networks and Deep Learning book.
In regards to the math- One of my degrees is in math, so that’s where that came from. :) Learning the math specific to neural networks was just a matter of applying the math that I already had experience with.
@zeus1082 Před 6 lety ⁺¹
deeplizard Iam doing the same what u did.but i was not a math graduate but I had more interest in maths.So its just easy for learning these concepts.
@deeplizard Před 6 lety ⁺¹
Hey aneesh - That's cool! Thanks for sharing. Are you also following the same online resources I mentioned?
@zeus1082 Před 6 lety ⁺¹
deeplizard no except fast.ai Iam following andrew ng videos ,some online resources and udemy.Your tutorials are usefull too.Like you said my interest in math made these concepts easy .Keep posting videos like this.
@nourelislam8565 Před 5 lety
Amazing explanation...... but I just want to know what is the purpose of having the average of the loss function for a certain weight for n training examples ??? ....I guess all we have to know is the change of the loss fun throughout the training examples ??
@deeplizard Před 5 lety ⁺¹
Hey Nour - It's because we want to know the average loss across all samples. This will tell us how our model performs on average across the entire data set.
@AndreaCatania Před 5 lety
Sorry if this question is stupid, but I don't understand exactly what mean the loss. Knowing the loss of weight W12 how can I update the related weight?
W12 += LOSS12 seems not correct to me
@money_wins_controls Před 5 lety
guys please help
@8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation
@ssffyy Před 3 lety
Hi sid,
This response is bit late as I just read your comment....I guess your confusion comes from the fact that you are considering g multiplied with z, where in fact its not a multiplication rather ... g is a function of z --> g(z) .... like f(x)... so when you take derivative of g(z) with respect to z, you end up getting g'(z)...hope this cleared any doubts.
@ajaymalik9147 Před 5 lety ⁺²
nice
@MaahirGupta Před 3 lety
You win.
@Nissearne12 Před 4 lety
Ahh.. at 7:07 explain my wonder how it's ever possible to use the Total Loss value for back propagation. The Total loss value have only Absolute value (because of the square operation), I was wounded how that Total Loss value could ever be used to know what direction each weight knobs should be turned (+ or -), it could not, the sign information is lost in the total Loss calculation!. But it it turns out that the sign of the Error come back into the equations again when looking at individual Losses d/da1(L) = 2(Actual Value - Target).
@PritishMishra Před 3 lety ⁺¹
0:00 - Introduction
1:01 - Precap of the Video
1:24 - Derivative of the Loss with respect to weights (Calculations)
11:56 - Conclusion
@ramiro6322 Před 3 lety ⁺¹
I would also add
5:45 First term (Loss with respect to Activation Output)
7:36 Second term (Activation Output with respect to Input)
8:52 Third term (Input with respect to weight)
11:30 Putting it all together
@deeplizard Před 3 lety
Thank you both! Your timestamps have been added to the video description :D
@PritishMishra Před 3 lety
@@deeplizard Thanks
@JordanMetroidManiac Před 4 lety
How does bias fit into all of this?
@deeplizard Před 4 lety
Bias terms are updated in the same way as the weights. I elaborate more on this on the upcoming episode dedicated to bias: deeplizard.com/learn/video/HetFihsXSys
@evertonsantosdeandradejuni3787 Před 2 lety
I feel like I can Implement this myself with c++, is this normal?
@ashutoshshah864 Před 3 lety ⁺¹
🙏🏽💪🏽🤙🏽
@FelidInPetasus Před 4 lety
Here's a thing that's unclear to me: You say that this process (which you do describe very neatly) can be applied to any weight in the network. However, shortly after 7:00, you conclude that the first term contains y_1. In video #2, you define this y_j as "the desired value of node j in the output layer L for a single training sample", i.e., the value a specific output neuron "ought to be". This works fine if you're looking at the weights connecting L-1 to L (the output layer), but doesn't make sense for the weights connecting, say, L-2 to L-1.
What value would I use for y_j in a case like that?
Edit: Thinking about it now: Am I correct in assuming that the first and second terms for the example you provided stay the same (even when looking at previous layers) and it's only the third term (specifically its weighted sum) that is "split up" into even more terms? This would remove the need to use a different y_j for other layers.
Other than that: thank you for your videos
@WahranRai Před 3 lety
12:44 to avoid the 2 in the equation of the gradient : minimizing 0.5*C_0 is same that minimizing C_0...: take 0.5*C_0 as loss function and when taking the derivative the 2 disappear.
@matharbarghi Před 4 lety
The partial derivative of loss function should be taken w.r.t weights of last layer in the network. But you mentioned that we should take derivative of loss function with respect to all weights of the network. Please correct me if I am wrong, otherwise correct it in your course. Thanks
@deeplizard Před 4 lety
You take the derivative of the loss function with respect to each weight. You then use each respective gradient to update each individual weight. For example, take the derivative of the loss with respect to weight w1. With the resulting gradient, update w1 to a new value. Do the same for w2, w3, etc...
@sinaasadi3800 Před 5 lety
Hi. Would you please answer my other comment ? I posted it yesterday under another video from this play list. And also thanks a lot for your videos.
@abubakarali6399 Před 3 lety
What degree you have and from which university?
@srijalshrestha7380 Před 6 lety
Thanks a lot, don't know when and how i will use in future but i understood it very well. Thank you.
@deeplizard Před 6 lety
You're welcome, Srijal! I'm glad you were able to gain an understanding!
@money_wins_controls Před 5 lety
guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation
@aryanrahman3212 Před 2 lety
When she says g-Prime, what she means is the derivative(or differentiation) of the activation function-g. This function can be anything literally.
@jorgecelis8459 Před 3 lety
Only detail is that the number of nodes should be indexed for the general case and then maybe use another letter for the number of examples =)
@user-sv1ew5ct5w Před 5 lety
I feel sentdex style
@deeplizard Před 5 lety
sentdex is cool 😎
@jeetenzhurlollz8387 Před 4 lety
far better than deeplearning.ai
@patrickryckman3867 Před 4 lety
8:22 you lost me. You said we just put this into the right side of the equation, but thats not the only thing you put into the right side of the equation.
@mechhyena6957 Před 4 lety
i have no clue what is going on in this video...
@JordanMetroidManiac Před 4 lety ⁺¹
Thicc
@kiarash7604 Před 4 lety
most of these videos are explaining the obvious

Další v pořadí

Automatické přehrávání

Backpropagation explained | Part 5 - What puts the "back" in backprop?