Backpropagation explained | Part 3 - Mathematical observations
Vložit
- čas přidán 9. 07. 2024
- We have focused on the mathematical notation and definitions that we would be using going forward to show how backpropagation mathematically works to calculate the gradient of the loss function. We'll start making use of what we learned and applying it in this video, so it's crucial that you have a full understanding of everything we covered in that video first.
• Backpropagation explai...
Here, we're going to be making some mathematical observations about the training process of a neural network. The observations we'll be making are actually facts that we already know conceptually, but we'll now just be expressing them mathematically. We'll be making these observations because the math for backprop that comes next, particularly, the differentiation of the loss function with respect to the weights, is going to make use of these observations.
We're first going to start out by making an observation regarding how we can mathematically express the loss function. We're then going to make observations around how we express the input and the output for any given node mathematically. And lastly, we'll observe what method we'll be using to differentiate the loss function via backpropagation.
🕒🦎 VIDEO SECTIONS 🦎🕒
00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources
01:15 Outline for the episode
01:44 Mathematical Observations
05:30 Expressing the loss as a composition of functions
10:15 Summary
10:56 Collective Intelligence and the DEEPLIZARD HIVEMIND
💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥
👋 Hey, we're Chris and Mandy, the creators of deeplizard!
👉 Check out the website for more learning material:
🔗 deeplizard.com
💻 ENROLL TO GET DOWNLOAD ACCESS TO CODE FILES
🔗 deeplizard.com/resources
🧠 Support collective intelligence, join the deeplizard hivemind:
🔗 deeplizard.com/hivemind
🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order
👉 Use your receipt from Neurohacker to get a discount on deeplizard courses
🔗 neurohacker.com/shop?rfsn=648...
👀 CHECK OUT OUR VLOG:
🔗 / deeplizardvlog
❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind:
Tammy
Mano Prime
Ling Li
🚀 Boost collective intelligence by sharing this video on social media!
👀 Follow deeplizard:
Our vlog: / deeplizardvlog
Facebook: / deeplizard
Instagram: / deeplizard
Twitter: / deeplizard
Patreon: / deeplizard
CZcams: / deeplizard
🎓 Deep Learning with deeplizard:
Deep Learning Dictionary - deeplizard.com/course/ddcpailzrd
Deep Learning Fundamentals - deeplizard.com/course/dlcpailzrd
Learn TensorFlow - deeplizard.com/course/tfcpailzrd
Learn PyTorch - deeplizard.com/course/ptcpailzrd
Natural Language Processing - deeplizard.com/course/txtcpai...
Reinforcement Learning - deeplizard.com/course/rlcpailzrd
Generative Adversarial Networks - deeplizard.com/course/gacpailzrd
🎓 Other Courses:
DL Fundamentals Classic - deeplizard.com/learn/video/gZ...
Deep Learning Deployment - deeplizard.com/learn/video/SI...
Data Science - deeplizard.com/learn/video/d1...
Trading - deeplizard.com/learn/video/Zp...
🛒 Check out products deeplizard recommends on Amazon:
🔗 amazon.com/shop/deeplizard
🎵 deeplizard uses music by Kevin MacLeod
🔗 / @incompetech_kmac
❤️ Please use the knowledge gained from deeplizard content for good, not evil.
Backpropagation explained | Part 1 - The intuition
czcams.com/video/XE3krf3CQls/video.html
Backpropagation explained | Part 2 - The mathematical notation
czcams.com/video/2mSysRx-1c0/video.html
Backpropagation explained | Part 3 - Mathematical observations
czcams.com/video/G5b4jRBKNxw/video.html
Backpropagation explained | Part 4 - Calculating the gradient
czcams.com/video/Zr5viAZGndE/video.html
Backpropagation explained | Part 5 - What puts the “back” in backprop?
czcams.com/video/xClK__CqZnQ/video.html
Machine Learning / Deep Learning Fundamentals playlist: czcams.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
Keras Machine Learning / Deep Learning Tutorial playlist: czcams.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
Andrew Ng - "I can explain everything"
Deep lizard - "Hold my backpropagation"
"if you can't explain it simply, you don't understand it well enough" - Einstein
@@ssffyy he do but problem with a lot of professors is they assume students knows everything already when in reality they know nothing. Sometimes I wonder if we pay for college just for the "tag" and not for teaching.
These videos are awesome. Finally, someone who can break the steps down sufficiently for those less fluent in maths to grasp easily. Great work.
I just want to say thanks. I've seen the 3Blue1Brown videos, some of Ng's videos and several different articles. Yours is the first content that is allowing me to get a handle on understanding the math behind back propagation.
You're welcome, Scott! Glad to hear that :D Thanks for letting me know.
Thank you very much for this video! I learnt how to explain complicated math to others through your simple-to-understand series! I also learnt how to understand the loss as a function of all those things!
A crisp and precise description upto sub/super scripts.
Thanks for the upload
Such a great video. I understood everything perfectly. You guys definitely need more subs.
I am once again commenting -"I love you :)" Awesome lectures!!
Speechless!, you are kind...
Great job, SUGGESTION, when you mention on of the indices you may show where it is in the Neural Network
awesome style of explanation! all thumbs up!
That was a really good explanation. It really help me. Thanks a lot.
Hey this really help me giving a better understanding in Andrew course. Thank you
Great! Thanks!
This is the best!
Great work, really helped me a lot :)
Superb! Really liked it.
Hi deeplizard, before posting my question I want first to thank you (again) for these amazing videos and valuable playlist, it's one of the main ressources I use in my journey to mastering deep learning, I also think it's very important to know the mathematical foundations of Neural Nets to actually understand them in the best way possible, my question is (it's a bit long, I hope it won't bother you): let's say we have a 3 by 1 neural net (an input layer with 3 neurons and an output layer with one neuron, no hidden layer), I find no problem calculating the outputs of the forward and back propagation when feeding the neural net with one training sample (namely feature1, feature2, feature3 inputs) and i know exactly how my initial weights get optimized, the problem I find is when feeding the NN with multiple training inputs each time, here, I don't know exactly how the initial weights get optimized.
I would be grateful if you could explain how the initial weights get modified when feeding the NN with multiple training inputs.
(For example we have training inputs of 3 × 3 Matrix.
[[195, 90, 41],
[140, 50, 30],
[180, 85, 43]]
the first column is the height, 2nd: the weight, 3rd: shoe size, where we feed the NN with the first row then the second and the third row).
I know that to calculate the new weights when feeding the NN with one training sample we rely on this formula:
New_weights = Initial_weights - learning_rate × (derivative of the loss function wrt the weights)
But when we feed the NN with more than one training example then which formula do we use ? Do we calculate the average of all dw (derivative of the loss function wrt the weights) or do we sum all of'em then multiply by the learning rate and substract them from the initial weights or what ?
I'm a bit confused here.
Thanks in advance.
Hey Lightning Blade - Glad to see you're progressing through the content! In regards to your question, your first thought is correct. We calculate the average derivative of the loss over all training samples. I touch on this in the next video starting at 11:56: czcams.com/video/Zr5viAZGndE/video.html
Let me know if this helps clarify!
keep up great videos :)
This video is great!
Thank you, pan!
you're the best
@deeplizard, thanks for this great series. I have a query : At 8:03, you represent input for node j as a function of all the weights connected to j. But my understanding is that the input is weighted sum of activation outputs of previous layer. So shouldn't activation output of the previous layer also be considered when representing the input function?
Our end goal is getting the derivative of the loss with respect to weights, not the output activations of the previous layer (for the reasons explained in the part 1 video). You could definitely make a valid expression using the output activations of the previous layer (good spot!); however, it is not useful for the goal at hand. Also, remember that the output activations are directly influenced by the weights, so they don't need to be in the expression either. Apologies for resurrecting, hopefully someone finds this useful, as it confused me at first too.
This video is giving me the urge to make a 3-node-per-layer, 3-layer neural network in Excel
C(sub zero ) is a loss of a particular sample.let suppose we have two classes(Male and female) four samples(two for each class) .Will the C(sub zero) represent loss for either two female samples or male samples. please explain this concept. thanks
Hi. Why do you say n-1 in the cost function?
Amazing Stuff! The way you break down the math is really neat. I love the way you spend time in walking through the notations, their meanings, and what they stand for. Would love to see a more comprehensive series on the math behind various loss functions, regularization techniques and maybe in general concepts from Ian Goodfellow's Deep Learning Book
{
"question": "Use of the chain rule is required because:",
"choices": [
"the loss function is a composition of functions.",
"of the vast number of weights.",
"of the sign of the gradient.",
"the gradient must flow backward."
],
"answer": "the loss function is a composition of functions.",
"creator": "Chris",
"creationDate": "2020-04-17T17:32:03.668Z"
}
More great questions, thanks Chris!
Just added your question to deeplizard.com/learn/video/G5b4jRBKNxw :)
0:00 Introduction
1:15 Outline for the episode
1:44 Observations
5:30 Expressing C0 as a composition of functions
10:15 Next video and Outro
Added to the description. Thanks so much!
@@deeplizard Thank you for the videos, they're great!
ETA on part 4? I have a midterm Wednesday :/
Hey Jordan - I'm _hoping_ to have it released by Tuesday evening.
Thank you!
Hey deeplizard, this video made all my doubts on this topic go away :). However- just asking- what happened to the bias of each neuron? Should z(j) not be sum[w(jk) × a(k)] + bias(j)?
Yes, bias was eliminated (or assumed to be 0) here for simplicity since we hadn't yet covered bias in the course. With bias included, your assumption for the calculation of z(j) is correct. We cover bias in a later episode here:
czcams.com/video/HetFihsXSys/video.html
@@deeplizard thank you so much for replying!
I assume the bias is implicit in w?
Wouldn't it be (yi - aj(l))^2 instead of (aj(l) - yi)^2 ?
Maybe it is somehow confusing to differentiate subscript k for layer l-1 and j for layer l but not using annother subscript (j is also used) for output layer L? Did i miss something?
Same here. I think another subscript should be used. For layer l-1 its k, for layer l, its j, and for layer L its j again??? Shouldn't it be another index letter?
It would be great to actual introduce the Einstein summation convention instead of sum. It looks a lot more neat and less clumsy. Otherwise thanks for the fantastic explanation.
Hi, It's a great explanation. Thanks a lot.I have a quick question (9:05), zj(L) which is the input to the activation function aj(L) is a function of the weights wj(L) and output of the activation function ak(l-1) of the layer k right.
If it is correct then C0j = C0j(aj(L)(zj(L)( wj(L) ak(l-1)))
Hey Chavan - Yes, that's right.
Notation wise, however, if we include ak(l-1) as you did above, then that would lead us to needing to express ak(l-1) as a function of the weights and the input of the previous layer, l-2. Then, we'd go through the same process again of expressing ak(l-2) as a function of the weights and the input of the previous layer, l-3. We'd continue this over and over until we reached the start of the network.
This is indeed correct, but it just gets a little messy with the notation if we continue expressing each function as a function of function of a function of a ...
I illustrate the concept and the math behind this idea in part 5 of the backpropagation series here: czcams.com/video/xClK__CqZnQ/video.html
@@deeplizard Awesome, that was the answer I was looking for. Great explanation BTW.
Loss function is same linear regression. Derivative also used in LR. Then how NN is different then linear regression?
we use non-linear activation function in NN to introduce non-linearity. You can also think of linear regression having linear activation in the final output layer
at this point you should have given an example of the activation function g() also it would be helpful if you had diagram/illustration on the right while you are explaining.
Hey Transolve - In part 4 and 5 of the backprop videos, I use a diagram to illustrate the math. Be sure to check those out!
Also, here is our video/blog on activation functions if you're interested: deeplizard.com/learn/video/m0pIlLfpXWE
Any activation function can be substituted in for g().
i later saw the illustration in the next video(5) and now see you have a seperate video on activated functions.
Damn......
Found these too basic. Trying to train a CNN with drawings of artist to classify different features from it. I don't think just formulation are gonna be of any help, these are in every lectures of DL. Can you make a lecture where training it with features of an image?
So I assume n is number of nodes in a layer. That's the only logical answer that works here.
That's correct.
video good, why lizard?
*I could a tale unfold whose lightest word*
*Would harrow up thy soul.*
👻🦎
HELP
Very Complicated..
change th name of channel
Why?