Backpropagation explained | Part 3 - Mathematical observations

deeplizard

zhlédnutí 44 554

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 9. 07. 2024
We have focused on the mathematical notation and definitions that we would be using going forward to show how backpropagation mathematically works to calculate the gradient of the loss function. We'll start making use of what we learned and applying it in this video, so it's crucial that you have a full understanding of everything we covered in that video first.
• Backpropagation explai...
Here, we're going to be making some mathematical observations about the training process of a neural network. The observations we'll be making are actually facts that we already know conceptually, but we'll now just be expressing them mathematically. We'll be making these observations because the math for backprop that comes next, particularly, the differentiation of the loss function with respect to the weights, is going to make use of these observations.
We're first going to start out by making an observation regarding how we can mathematically express the loss function. We're then going to make observations around how we express the input and the output for any given node mathematically. And lastly, we'll observe what method we'll be using to differentiate the loss function via backpropagation.
🕒🦎 VIDEO SECTIONS 🦎🕒
00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources
01:15 Outline for the episode
01:44 Mathematical Observations
05:30 Expressing the loss as a composition of functions
10:15 Summary
10:56 Collective Intelligence and the DEEPLIZARD HIVEMIND
💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥
👋 Hey, we're Chris and Mandy, the creators of deeplizard!
👉 Check out the website for more learning material:
🔗 deeplizard.com
💻 ENROLL TO GET DOWNLOAD ACCESS TO CODE FILES
🔗 deeplizard.com/resources
🧠 Support collective intelligence, join the deeplizard hivemind:
🔗 deeplizard.com/hivemind
🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order
👉 Use your receipt from Neurohacker to get a discount on deeplizard courses
🔗 neurohacker.com/shop?rfsn=648...
👀 CHECK OUT OUR VLOG:
🔗 / deeplizardvlog
❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind:
Tammy
Mano Prime
Ling Li
🚀 Boost collective intelligence by sharing this video on social media!
👀 Follow deeplizard:
Our vlog: / deeplizardvlog
Facebook: / deeplizard
Instagram: / deeplizard
Twitter: / deeplizard
Patreon: / deeplizard
CZcams: / deeplizard
🎓 Deep Learning with deeplizard:
Deep Learning Dictionary - deeplizard.com/course/ddcpailzrd
Deep Learning Fundamentals - deeplizard.com/course/dlcpailzrd
Learn TensorFlow - deeplizard.com/course/tfcpailzrd
Learn PyTorch - deeplizard.com/course/ptcpailzrd
Natural Language Processing - deeplizard.com/course/txtcpai...
Reinforcement Learning - deeplizard.com/course/rlcpailzrd
Generative Adversarial Networks - deeplizard.com/course/gacpailzrd
🎓 Other Courses:
DL Fundamentals Classic - deeplizard.com/learn/video/gZ...
Deep Learning Deployment - deeplizard.com/learn/video/SI...
Data Science - deeplizard.com/learn/video/d1...
Trading - deeplizard.com/learn/video/Zp...
🛒 Check out products deeplizard recommends on Amazon:
🔗 amazon.com/shop/deeplizard
🎵 deeplizard uses music by Kevin MacLeod
🔗 / @incompetech_kmac
❤️ Please use the knowledge gained from deeplizard content for good, not evil.

Komentáře • 66

@deeplizard Před 6 lety ⁺⁹
Backpropagation explained | Part 1 - The intuition
czcams.com/video/XE3krf3CQls/video.html
Backpropagation explained | Part 2 - The mathematical notation
czcams.com/video/2mSysRx-1c0/video.html
Backpropagation explained | Part 3 - Mathematical observations
czcams.com/video/G5b4jRBKNxw/video.html
Backpropagation explained | Part 4 - Calculating the gradient
czcams.com/video/Zr5viAZGndE/video.html
Backpropagation explained | Part 5 - What puts the “back” in backprop?
czcams.com/video/xClK__CqZnQ/video.html
Machine Learning / Deep Learning Fundamentals playlist: czcams.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
Keras Machine Learning / Deep Learning Tutorial playlist: czcams.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
@rewangtm Před 4 lety ⁺²⁴
Andrew Ng - "I can explain everything"
Deep lizard - "Hold my backpropagation"
@ssffyy Před 3 lety
"if you can't explain it simply, you don't understand it well enough" - Einstein
@sohailape Před 2 lety ⁺²
@@ssffyy he do but problem with a lot of professors is they assume students knows everything already when in reality they know nothing. Sometimes I wonder if we pay for college just for the "tag" and not for teaching.
@jamestuckett5285 Před 5 lety ⁺²⁷
These videos are awesome. Finally, someone who can break the steps down sufficiently for those less fluent in maths to grasp easily. Great work.
@scottthornley5405 Před 5 lety ⁺²⁶
I just want to say thanks. I've seen the 3Blue1Brown videos, some of Ng's videos and several different articles. Yours is the first content that is allowing me to get a handle on understanding the math behind back propagation.
@deeplizard Před 5 lety
You're welcome, Scott! Glad to hear that :D Thanks for letting me know.
@tymothylim6550 Před 3 lety ⁺²
Thank you very much for this video! I learnt how to explain complicated math to others through your simple-to-understand series! I also learnt how to understand the loss as a function of all those things!
@Uditsinghparihar Před 5 lety ⁺¹
A crisp and precise description upto sub/super scripts.
Thanks for the upload
@kushagrachaturvedy2821 Před 4 lety ⁺⁴
Such a great video. I understood everything perfectly. You guys definitely need more subs.
@ujjwalkumar8173 Před 3 lety ⁺²
I am once again commenting -"I love you :)" Awesome lectures!!
@datasciencestory15 Před 4 lety ⁺¹
Speechless!, you are kind...
@ricardofrancalaccisavarisr4364 Před 3 lety ⁺³
Great job, SUGGESTION, when you mention on of the indices you may show where it is in the Neural Network
@mariodurndorfer6996 Před 5 lety ⁺¹
awesome style of explanation! all thumbs up!
@gero8049 Před 3 lety ⁺¹
That was a really good explanation. It really help me. Thanks a lot.
@gero8049 Před 3 lety ⁺¹
Hey this really help me giving a better understanding in Andrew course. Thank you
@nerkulec Před 6 lety ⁺²
Great! Thanks!
@danielrodriguezgonzalez2982 Před 4 lety ⁺¹
This is the best!
@vishakdm7728 Před 3 lety ⁺¹
Great work, really helped me a lot :)
@todianmishtaku6249 Před 4 lety ⁺¹
Superb! Really liked it.
@lightningblade9347 Před 6 lety ⁺²
Hi deeplizard, before posting my question I want first to thank you (again) for these amazing videos and valuable playlist, it's one of the main ressources I use in my journey to mastering deep learning, I also think it's very important to know the mathematical foundations of Neural Nets to actually understand them in the best way possible, my question is (it's a bit long, I hope it won't bother you): let's say we have a 3 by 1 neural net (an input layer with 3 neurons and an output layer with one neuron, no hidden layer), I find no problem calculating the outputs of the forward and back propagation when feeding the neural net with one training sample (namely feature1, feature2, feature3 inputs) and i know exactly how my initial weights get optimized, the problem I find is when feeding the NN with multiple training inputs each time, here, I don't know exactly how the initial weights get optimized.
I would be grateful if you could explain how the initial weights get modified when feeding the NN with multiple training inputs.
(For example we have training inputs of 3 × 3 Matrix.
[[195, 90, 41],
[140, 50, 30],
[180, 85, 43]]
the first column is the height, 2nd: the weight, 3rd: shoe size, where we feed the NN with the first row then the second and the third row).
I know that to calculate the new weights when feeding the NN with one training sample we rely on this formula:
New_weights = Initial_weights - learning_rate × (derivative of the loss function wrt the weights)
But when we feed the NN with more than one training example then which formula do we use ? Do we calculate the average of all dw (derivative of the loss function wrt the weights) or do we sum all of'em then multiply by the learning rate and substract them from the initial weights or what ?
I'm a bit confused here.
Thanks in advance.
@deeplizard Před 6 lety ⁺²
Hey Lightning Blade - Glad to see you're progressing through the content! In regards to your question, your first thought is correct. We calculate the average derivative of the loss over all training samples. I touch on this in the next video starting at 11:56: czcams.com/video/Zr5viAZGndE/video.html
Let me know if this helps clarify!
@loneWOLF-fq7nz Před 5 lety ⁺²
keep up great videos :)
@panwong9624 Před 6 lety ⁺¹
This video is great!
@deeplizard Před 6 lety
Thank you, pan!
@codeXcycle Před rokem ⁺¹
you're the best
@samhithbarlaya23 Před 4 lety ⁺³
@deeplizard, thanks for this great series. I have a query : At 8:03, you represent input for node j as a function of all the weights connected to j. But my understanding is that the input is weighted sum of activation outputs of previous layer. So shouldn't activation output of the previous layer also be considered when representing the input function?
@SKULDROPR Před rokem
Our end goal is getting the derivative of the loss with respect to weights, not the output activations of the previous layer (for the reasons explained in the part 1 video). You could definitely make a valid expression using the output activations of the previous layer (good spot!); however, it is not useful for the goal at hand. Also, remember that the output activations are directly influenced by the weights, so they don't need to be in the expression either. Apologies for resurrecting, hopefully someone finds this useful, as it confused me at first too.
@WheatleyOS Před 3 lety
This video is giving me the urge to make a 3-node-per-layer, 3-layer neural network in Excel
@assemblyorganization522 Před 4 lety
C(sub zero ) is a loss of a particular sample.let suppose we have two classes(Male and female) four samples(two for each class) .Will the C(sub zero) represent loss for either two female samples or male samples. please explain this concept. thanks
@Boldalt Před 4 lety
Hi. Why do you say n-1 in the cost function?
@adesiph.d.journal461 Před 3 lety ⁺¹
Amazing Stuff! The way you break down the math is really neat. I love the way you spend time in walking through the notations, their meanings, and what they stand for. Would love to see a more comprehensive series on the math behind various loss functions, regularization techniques and maybe in general concepts from Ian Goodfellow's Deep Learning Book
@thespam8385 Před 4 lety ⁺²
{
"question": "Use of the chain rule is required because:",
"choices": [
"the loss function is a composition of functions.",
"of the vast number of weights.",
"of the sign of the gradient.",
"the gradient must flow backward."
],
"answer": "the loss function is a composition of functions.",
"creator": "Chris",
"creationDate": "2020-04-17T17:32:03.668Z"
}
@deeplizard Před 4 lety
More great questions, thanks Chris!
Just added your question to deeplizard.com/learn/video/G5b4jRBKNxw :)
@ramiro6322 Před 3 lety ⁺¹
0:00 Introduction
1:15 Outline for the episode
1:44 Observations
5:30 Expressing C0 as a composition of functions
10:15 Next video and Outro
@deeplizard Před 3 lety ⁺¹
Added to the description. Thanks so much!
@ramiro6322 Před 3 lety
@@deeplizard Thank you for the videos, they're great!
@Jxordan Před 6 lety ⁺⁸
ETA on part 4? I have a midterm Wednesday :/
@deeplizard Před 6 lety ⁺²
Hey Jordan - I'm _hoping_ to have it released by Tuesday evening.
@Jxordan Před 6 lety ⁺¹
Thank you!
@hailhuskz Před 2 lety ⁺¹
Hey deeplizard, this video made all my doubts on this topic go away :). However- just asking- what happened to the bias of each neuron? Should z(j) not be sum[w(jk) × a(k)] + bias(j)?
@deeplizard Před 2 lety ⁺¹
Yes, bias was eliminated (or assumed to be 0) here for simplicity since we hadn't yet covered bias in the course. With bias included, your assumption for the calculation of z(j) is correct. We cover bias in a later episode here:
czcams.com/video/HetFihsXSys/video.html
@hailhuskz Před 2 lety
@@deeplizard thank you so much for replying!
@tostupidforname Před 4 lety ⁺¹
I assume the bias is implicit in w?
@lucavoros8073 Před 2 lety
Wouldn't it be (yi - aj(l))^2 instead of (aj(l) - yi)^2 ?
@mariodurndorfer6996 Před 5 lety
Maybe it is somehow confusing to differentiate subscript k for layer l-1 and j for layer l but not using annother subscript (j is also used) for output layer L? Did i miss something?
@georgepalafox5967 Před 4 lety
Same here. I think another subscript should be used. For layer l-1 its k, for layer l, its j, and for layer L its j again??? Shouldn't it be another index letter?
@kaushikkn Před 5 lety
It would be great to actual introduce the Einstein summation convention instead of sum. It looks a lot more neat and less clumsy. Otherwise thanks for the fantastic explanation.
@chavankoppa Před 6 lety ⁺¹
Hi, It's a great explanation. Thanks a lot.I have a quick question (9:05), zj(L) which is the input to the activation function aj(L) is a function of the weights wj(L) and output of the activation function ak(l-1) of the layer k right.
If it is correct then C0j = C0j(aj(L)(zj(L)( wj(L) ak(l-1)))
@deeplizard Před 6 lety ⁺¹
Hey Chavan - Yes, that's right.
Notation wise, however, if we include ak(l-1) as you did above, then that would lead us to needing to express ak(l-1) as a function of the weights and the input of the previous layer, l-2. Then, we'd go through the same process again of expressing ak(l-2) as a function of the weights and the input of the previous layer, l-3. We'd continue this over and over until we reached the start of the network.
This is indeed correct, but it just gets a little messy with the notation if we continue expressing each function as a function of function of a function of a ...
I illustrate the concept and the math behind this idea in part 5 of the backpropagation series here: czcams.com/video/xClK__CqZnQ/video.html
@ssffyy Před 3 lety
@@deeplizard Awesome, that was the answer I was looking for. Great explanation BTW.
@asadali4153 Před 4 lety
Loss function is same linear regression. Derivative also used in LR. Then how NN is different then linear regression?
@jsarvesh Před 3 lety
we use non-linear activation function in NN to introduce non-linearity. You can also think of linear regression having linear activation in the final output layer
@transolve9726 Před 5 lety
at this point you should have given an example of the activation function g() also it would be helpful if you had diagram/illustration on the right while you are explaining.
@deeplizard Před 5 lety ⁺²
Hey Transolve - In part 4 and 5 of the backprop videos, I use a diagram to illustrate the math. Be sure to check those out!
Also, here is our video/blog on activation functions if you're interested: deeplizard.com/learn/video/m0pIlLfpXWE
Any activation function can be substituted in for g().
@transolve9726 Před 5 lety
i later saw the illustration in the next video(5) and now see you have a seperate video on activated functions.
@MaahirGupta Před 3 lety
Damn......
@poulamikar5921 Před 4 lety
Found these too basic. Trying to train a CNN with drawings of artist to classify different features from it. I don't think just formulation are gonna be of any help, these are in every lectures of DL. Can you make a lecture where training it with features of an image?
@fupopanda Před 5 lety
So I assume n is number of nodes in a layer. That's the only logical answer that works here.
@deeplizard Před 5 lety
That's correct.
@rohtashbeniwal9202 Před 4 lety
video good, why lizard?
@deeplizard Před 4 lety ⁺¹
*I could a tale unfold whose lightest word*
*Would harrow up thy soul.*
👻🦎
@markcuello5 Před rokem
HELP
@sukantdebnath4463 Před 5 lety
Very Complicated..
@sprajapati2011 Před 4 lety
change th name of channel
@deeplizard Před 4 lety ⁺¹
Why?

Další v pořadí

Automatické přehrávání

Backpropagation explained | Part 4 - Calculating the gradient