0:03
Vložit
- čas přidán 14. 08. 2018
- I'm (finally after all this time) thinking of new videos. If I get attention in the donate button area, I will proceed:
www.paypal.com/donate/?busine...
Easy explanation for how backpropagation is done. Topics covered:
- gradient descent
- exploding gradients
- learning rate
- backpropagation
- cost functions
- optimization steps - Věda a technologie
Hey! This is the best explanation I have ever seen in the internet .I was trying to understand these concepts watching videos, etc but without positive results. Now I understand how these networks function and their structure. I have forgotten my calculus and here you explain the chain rule in very simple words anyone can understand. Thank you for these great videos and God bless.
The best backpropagation explainer on youtube we have in 2022
Thank you so much Mikael. Extremely clear. A good fundation before going further !
I was trying to understand this for a long! You made my day.
amazing, I spend hours trying to understand this from different sources and now I think I finally understand, thank you!
Hi Mike, I want to thank you for this great explanation. I was really struggling to grasp the concept. No one else went quite as far in depth.
The best explanation.
Knew what it was but never understood why. Thank you for this video!
hello Mikael, finally somone who is able to explain complexity by simplicity. Thank you very much to reveal secrets hidden by most of videos
Good Job! This is a tough subject and you tried, with success, simplify the explanation as much it was possible. I did the AndrewNG course at coursera, and his explanation even for me that had previous knowledge of maths involved was difficult to understand. Now I think you should implement in this algorithm in Python, for example.
Great job Mikael, you explained it very clear.
Thank you
Good job! Thank you very much!
you guys know that you can just turn up the volume right? Thank you Mike for breaking it down so clearly!
Absolutely the best explanation for the backpropagation thank you thank you thank you
Struggling this subject for weeks, now i have better understanding after watching this video.
wow, so far the best explanation I found. So simple, thanks a lot !
Excellent job! Thank you!
Thanks Mikael, simplest explanation. You made my day mate.
That's the best explanation I have seen. Thanks a lot!
I can't thank you enough. I gave so much money by taking loan for one course but did not understand it there. Thank you for your help.
good stuff! a clear, simple starting case to build on.
It is one of the best explanation on this subject! Thanks so much!
Thank you so much! Best explanation I’ve seen ever.
Thank you very much. This helped alot. I now understand the lecture given to me
Finally i understand this! Thanks
Extremely good and easy explanation
great explanation Mikael thanks a lot
totally underrated video. love it.
Thank you for this video it really helped me!
Good stuff. Thanks for contributing
Thank you very much best explanation ever!
thanks man. You are a hero!
Its not easy!! You made it easy. Thanks alot
simple and neat! thanks!
Wow, thank you
Cool. Thanks Mikael!!!
Great! Finally you understand something.
Without a hidden layer it is a bit difficult to understand how to apply bckpropagation. But the thing that doesn't explain any tutorial is this and you would be the right person to teach us. I use keras but also python would be good; "How to create your own classification or regression dataset". Thank you.
Thank you for your comment! At the end of the video the generalized case is briefly explained. If you follow the math exactly as in the single-weight case, you will see it works out. If I find time, I may make a video about that, but it might be a bit redundant.
Thanks sir it is very good explanation
Simply outstanding , clear and concise explanation. I wonder how people with no calculus background learn deep learning?
Thank you, sir. Helped a smooth brain understand.
good effort
thank you, it really helped me to understand the principle behind the backpropagation. In the future i would like to see how to implement it with layers that have 2 or more neurons. How to calculate the error for each neuron in that case, to be precise
Thanks Laine.
Thank you!
Thank you Mikael for this concise lecture. Can you share a lecture with the cost function of logistic regression implemented in Neural Network?
very nice explanation!
Good Job! …. Thanks
There are some videos which you wish that it never end. This video in among top of these.
Excellent
very well explained👍
Thank you
Amazing explanation but the audio is painful to listen to
I understand the logic and the thoughts behind this concept. Unfortunately I just can't wrap my head around how to calculate it with these kinds of formulas.
But if I saw a code example I would understand it without an issue. I don't know why my brain works like that. But mathematical formulas are mostly useless to me =(
do you move back a layer after gettiing w_1 = 0.59? or after getting w_1 = 0.333
What software did you use to write the stuff.. good lecture
What changes in the equation if I have more than just 1 Neuron per Layer though? Especially since they are cross-connected via more weights, I don't know exactly how to deal with this.
What about when we have like activation function like relu. Or etc ?
At 7:53 what are the values for a and y that have the parabola experiencing a minimum around 0.3334 when for a desired y value of 0.5 the value of "a" would have to be 0.5? That is, the min for the cost function occurs when a is 0.5 so why in the graph has the min for it been relocated to 0.3334 ?
Any help dealing with multi-neuron layers?, the formulas in 11:19 look different for multi-neuron layers
Check my channel for another example with multiple layers.
Would you please talk about the situation with activate function(sigmoid)?
It's a little bit confusing for me..
thanks a lots!
There is now a video about this: czcams.com/video/CoPl2xn2nmk/video.html
nice video, im a little confused with which letters for whitch values
- a = value from activation function / or just simply output from a ny given neuron?
- C = loss/error gradient
and which of these values qualify as the gradient?
a=activation (with or without activation function)
C=loss/error/cost (these are all the same thing, the naming varies between textbooks and frameworks)
WRT gradients: this is a 1-dimensional case for educational/amusement purposes. In actual networks, you would have more weights, therefore more dimensions and you would use the term 'gradient' or 'jacobian', depending on how you implement it etc.
I have an example with two dimensions here: czcams.com/video/Bdrm-bOC5Ek/video.html
something happened to the sound at 8:21
You got that right Peter Peters Peterss
Don't you hate it when the lecturer goes outside for a cigarette in the middle of a lecture... but continues teaching through the window.
Yes, we get it... your powerpoint remote works through glass! But WE CAN'T HEAR YOU! XD
Yes, sorry about that!
@@garychap8384 LMFAO
Peter Peters Peterss Petersss
How do we determine which way to go? I mean ,the direction of change in weight .If we are in the left side of concave curve?
No one ever says when multiple layers and multiple outputs exist when the weights get adjusted do you do numerous forward passes after each individual weight is adjusted? Or do you update ALL the weights THEN do a single new forward pass.
Yeah, single forward pass (during which gradients get stored, see my other videos) followed by a single backpropagation pass through the entire network, updating all weights by a bit.
@@mikaellaine9490 Thanks. Much appreciated.
If I had different amounts of neurons per layer, then would the formula at 11:30 be changed to (average of the activations of the last layer) * (average of the weights of the next layer) ... * (average cost of all outputs)?
From what i read, yes. But distributing the error canbe varied too.
I don't know anything about this subject but I was understanding it until the rate of change function. Probably a stupid question but why is there a 2 in the rate of change function, as in 2(a-y). Is this 2 * (1.2 - 05)? Why the 2? I can't really see the reference to the y = x^2 but that's probably just me not understanding the basics. Maybe somebody can explain for a dummy like me.
Wait maybe I understand my mistake, the result should be 0.4 right? So its actually 2(a-1) because otherwise multiplication goes first and you end up with 1.4?
The derivative of x^2 (x squared) is 2x. The cost function C is the square of the difference between actual and desired output i.e. (a-y)^2. Its derivative (slope) with respect to a is 2(a-y).
We don't use the actual cost to make the adjustment, but the slope of the cost. That always points 'downhill' to zero cost.
Where does the '-1' come from? It looks like it is in the position of y but y is 0.5. Not -1.
Oh. I see. the 2 was distributed to it but not to a.
"as per usual? um what is usaul?
Can you please tell me how did you graph that cost function? I plotted this cost function in my calculator and I am getting a different polynomial. I graphed ((x*0.8)-0.5)**2 thanks.
Hi and thank you for your question. I've used Apple's Grapher for all the plots. It should look like in the video. Your expression ((x*0.8)-0.5)**2 is correct.
also, im not really able to find anywhere what delta signifies here, only stuff on the delta rule
on 2:40, Mikael mentioned "...and the error therefore, is 0.5" i think he meant "and the *desired output*, therefore is 0.5"? slight erratum perhaps?
because otherwise, the cost (C) is 0.49, not 0.5
Please how can I update bias please some one answers me?
Maybe this is a dumb question, but how you go from 2(a-y) to 2a-1? (7:17)
oh, i got it. 0.5 is the desired input * 2 = 1
what if the number of neurons in the layer is more than one
There is one thing I dont understand.
Suposse you have two inputs, for the first input the perfect value is w1=0.33
But for the secon input, the perfect value would be w1 = 0.67.
How would you compute the backpropagation to get the perfect value to minimize the cost function?
Run multiple experiments with different inputs and measure the outcome: if the outcome is perfect, there is no learning. How would you answer the question?
it wasn't for dummies.it was for scientist.
he is the dummy
It's for anyone who has taken calculus.
what about code?
Hi,
a = i * w
1.5. * 2(a -y) = 4.5 * w - 1.5
What happened to the y?
y is given = the target value, here = 0.5. => 1.5*2(1.2-0.5) = 2.1 which equal to 4.5*0.8-1.5
Can anyone explain how to plot for 2(a-y),c=(a-y)2. i=1.5
Suddenly, I feel depressed... around 8:20
what about 2d input
There is a video for 2d input: czcams.com/video/Bdrm-bOC5Ek/video.html
If all I want is:
a = 0.5
with:
a = i · w
then:
w = a / i = 0.3333
One simple division, no differential calculus, no gradient descent :-)
Brilliant. Now generalize that to any sized layer and any number of layers. I suppose you won't need bias units at all. You have just solved deep learning. Profit.
it was moreless comprehensible until the "mirrored 6" character appeared with no explanation of what it was, how it was called and why it was there. so let's move on to another video on backpropgagation...
4:36 me watching this in 8th grade: bruh
the word "probably" means that, there are people who don't know this (you as example).
7:04: 2(1.5 * w) -1 = 2(1.5 * 0.8) -1 = 1.4 not 1.5
During editing do you not notice how much you lip smack? Makes it so hard to listen to. Otherwise, thank you, the content is helpful.
Thank you !