0:03

Sdílet
Vložit
  • čas přidán 14. 08. 2018
  • I'm (finally after all this time) thinking of new videos. If I get attention in the donate button area, I will proceed:
    www.paypal.com/donate/?busine...
    Easy explanation for how backpropagation is done. Topics covered:
    - gradient descent
    - exploding gradients
    - learning rate
    - backpropagation
    - cost functions
    - optimization steps
  • Věda a technologie

Komentáře • 114

  • @Orthodoxforever71
    @Orthodoxforever71 Před 3 lety +7

    Hey! This is the best explanation I have ever seen in the internet .I was trying to understand these concepts watching videos, etc but without positive results. Now I understand how these networks function and their structure. I have forgotten my calculus and here you explain the chain rule in very simple words anyone can understand. Thank you for these great videos and God bless.

  • @shanruan2524
    @shanruan2524 Před 2 lety +1

    The best backpropagation explainer on youtube we have in 2022

  • @gillesgardy8957
    @gillesgardy8957 Před 4 lety +2

    Thank you so much Mikael. Extremely clear. A good fundation before going further !

  • @rohitd7834
    @rohitd7834 Před 4 lety +3

    I was trying to understand this for a long! You made my day.

  • @originalandfunnyname8076

    amazing, I spend hours trying to understand this from different sources and now I think I finally understand, thank you!

  • @vunpac5
    @vunpac5 Před 4 lety +2

    Hi Mike, I want to thank you for this great explanation. I was really struggling to grasp the concept. No one else went quite as far in depth.

  • @gulamm1
    @gulamm1 Před měsícem

    The best explanation.

  • @farenhite4329
    @farenhite4329 Před 4 lety +1

    Knew what it was but never understood why. Thank you for this video!

  • @alexandrefabretti1174
    @alexandrefabretti1174 Před 3 lety +3

    hello Mikael, finally somone who is able to explain complexity by simplicity. Thank you very much to reveal secrets hidden by most of videos

  • @qzwwzt
    @qzwwzt Před 5 lety +22

    Good Job! This is a tough subject and you tried, with success, simplify the explanation as much it was possible. I did the AndrewNG course at coursera, and his explanation even for me that had previous knowledge of maths involved was difficult to understand. Now I think you should implement in this algorithm in Python, for example.

  • @muhammeddal9661
    @muhammeddal9661 Před 5 lety +1

    Great job Mikael, you explained it very clear.
    Thank you

  • @user-vi2fp6dl7b
    @user-vi2fp6dl7b Před 3 měsíci

    Good job! Thank you very much!

  • @allenjerjiss3163
    @allenjerjiss3163 Před 4 lety +2

    you guys know that you can just turn up the volume right? Thank you Mike for breaking it down so clearly!

  • @turkirob
    @turkirob Před 4 lety +1

    Absolutely the best explanation for the backpropagation thank you thank you thank you

  • @ekoprasetyo3999
    @ekoprasetyo3999 Před 2 lety

    Struggling this subject for weeks, now i have better understanding after watching this video.

  • @imed6240
    @imed6240 Před 3 lety

    wow, so far the best explanation I found. So simple, thanks a lot !

  • @newcoder7166
    @newcoder7166 Před 5 lety +1

    Excellent job! Thank you!

  • @faisalriazbhatti
    @faisalriazbhatti Před 3 lety

    Thanks Mikael, simplest explanation. You made my day mate.

  • @denisvoronov6571
    @denisvoronov6571 Před 3 lety +1

    That's the best explanation I have seen. Thanks a lot!

  • @chinmay6144
    @chinmay6144 Před rokem

    I can't thank you enough. I gave so much money by taking loan for one course but did not understand it there. Thank you for your help.

  • @jarrodhaas
    @jarrodhaas Před 2 lety

    good stuff! a clear, simple starting case to build on.

  • @flavialan4544
    @flavialan4544 Před 3 lety

    It is one of the best explanation on this subject! Thanks so much!

  • @klyntonh7168
    @klyntonh7168 Před 2 lety

    Thank you so much! Best explanation I’ve seen ever.

  • @thechen6985
    @thechen6985 Před 5 lety

    Thank you very much. This helped alot. I now understand the lecture given to me

  • @dabdas100
    @dabdas100 Před 4 lety +1

    Finally i understand this! Thanks

  • @cvsnreddy1700
    @cvsnreddy1700 Před 4 lety

    Extremely good and easy explanation

  • @JoeWong81
    @JoeWong81 Před 4 lety +1

    great explanation Mikael thanks a lot

  • @joelmun2780
    @joelmun2780 Před rokem

    totally underrated video. love it.

  • @stuartallen2001
    @stuartallen2001 Před 4 lety +2

    Thank you for this video it really helped me!

  • @danikhan21
    @danikhan21 Před 4 lety

    Good stuff. Thanks for contributing

  • @TheStrelok7
    @TheStrelok7 Před 3 lety

    Thank you very much best explanation ever!

  • @murat2073
    @murat2073 Před rokem +1

    thanks man. You are a hero!

  • @hasanabdlghani5244
    @hasanabdlghani5244 Před 4 lety +1

    Its not easy!! You made it easy. Thanks alot

  • @djeros666
    @djeros666 Před 5 lety +1

    simple and neat! thanks!

  • @labCmais135
    @labCmais135 Před měsícem

    Wow, thank you

  • @ksrajavel
    @ksrajavel Před 3 lety

    Cool. Thanks Mikael!!!

  • @nemuccio1
    @nemuccio1 Před 4 lety +1

    Great! Finally you understand something.
    Without a hidden layer it is a bit difficult to understand how to apply bckpropagation. But the thing that doesn't explain any tutorial is this and you would be the right person to teach us. I use keras but also python would be good; "How to create your own classification or regression dataset". Thank you.

    • @mikaellaine9490
      @mikaellaine9490  Před 3 lety +1

      Thank you for your comment! At the end of the video the generalized case is briefly explained. If you follow the math exactly as in the single-weight case, you will see it works out. If I find time, I may make a video about that, but it might be a bit redundant.

  • @safiasafia9950
    @safiasafia9950 Před 5 lety

    Thanks sir it is very good explanation

  • @xyzaex
    @xyzaex Před 2 lety

    Simply outstanding , clear and concise explanation. I wonder how people with no calculus background learn deep learning?

  • @obsidianhead
    @obsidianhead Před 10 měsíci

    Thank you, sir. Helped a smooth brain understand.

  • @prof.meenav1550
    @prof.meenav1550 Před 2 lety

    good effort

  • @datasow9493
    @datasow9493 Před 5 lety +1

    thank you, it really helped me to understand the principle behind the backpropagation. In the future i would like to see how to implement it with layers that have 2 or more neurons. How to calculate the error for each neuron in that case, to be precise

  • @vijayyarabolu9067
    @vijayyarabolu9067 Před 5 lety

    Thanks Laine.

  • @scottk5083
    @scottk5083 Před 4 lety +1

    Thank you!

  • @RagibShahariar
    @RagibShahariar Před 4 lety

    Thank you Mikael for this concise lecture. Can you share a lecture with the cost function of logistic regression implemented in Neural Network?

  • @MukeshArethia
    @MukeshArethia Před 4 lety

    very nice explanation!

  • @ahmidiedu7112
    @ahmidiedu7112 Před rokem +1

    Good Job! …. Thanks

  • @talhanaeemrao4305
    @talhanaeemrao4305 Před 7 měsíci

    There are some videos which you wish that it never end. This video in among top of these.

  • @trevortyne534
    @trevortyne534 Před rokem

    Excellent

  • @andrew-cb6lh
    @andrew-cb6lh Před rokem

    very well explained👍

  • @SureshBabu-tb7vh
    @SureshBabu-tb7vh Před 5 lety

    Thank you

  • @georgeruellan
    @georgeruellan Před 3 lety +2

    Amazing explanation but the audio is painful to listen to

  • @AleksanderFimreite
    @AleksanderFimreite Před 3 lety +3

    I understand the logic and the thoughts behind this concept. Unfortunately I just can't wrap my head around how to calculate it with these kinds of formulas.
    But if I saw a code example I would understand it without an issue. I don't know why my brain works like that. But mathematical formulas are mostly useless to me =(

  • @cdxer
    @cdxer Před 5 lety +1

    do you move back a layer after gettiing w_1 = 0.59? or after getting w_1 = 0.333

  • @raaziyahshamim4761
    @raaziyahshamim4761 Před 6 měsíci

    What software did you use to write the stuff.. good lecture

  • @_FLOROID_
    @_FLOROID_ Před 3 lety

    What changes in the equation if I have more than just 1 Neuron per Layer though? Especially since they are cross-connected via more weights, I don't know exactly how to deal with this.

  • @user-jy5pu6bg5p
    @user-jy5pu6bg5p Před 11 dny

    What about when we have like activation function like relu. Or etc ?

  • @atlantaguitar9689
    @atlantaguitar9689 Před rokem

    At 7:53 what are the values for a and y that have the parabola experiencing a minimum around 0.3334 when for a desired y value of 0.5 the value of "a" would have to be 0.5? That is, the min for the cost function occurs when a is 0.5 so why in the graph has the min for it been relocated to 0.3334 ?

  • @cachaceirosdohawai3070
    @cachaceirosdohawai3070 Před 3 měsíci

    Any help dealing with multi-neuron layers?, the formulas in 11:19 look different for multi-neuron layers

    • @mikaellaine9490
      @mikaellaine9490  Před 3 měsíci

      Check my channel for another example with multiple layers.

  • @dennistsai5348
    @dennistsai5348 Před 5 lety +1

    Would you please talk about the situation with activate function(sigmoid)?
    It's a little bit confusing for me..
    thanks a lots!

    • @mikaellaine9490
      @mikaellaine9490  Před 4 lety

      There is now a video about this: czcams.com/video/CoPl2xn2nmk/video.html

  • @bubblesgrappling736
    @bubblesgrappling736 Před 4 lety

    nice video, im a little confused with which letters for whitch values
    - a = value from activation function / or just simply output from a ny given neuron?
    - C = loss/error gradient
    and which of these values qualify as the gradient?

    • @mikaellaine9490
      @mikaellaine9490  Před 4 lety +1

      a=activation (with or without activation function)
      C=loss/error/cost (these are all the same thing, the naming varies between textbooks and frameworks)
      WRT gradients: this is a 1-dimensional case for educational/amusement purposes. In actual networks, you would have more weights, therefore more dimensions and you would use the term 'gradient' or 'jacobian', depending on how you implement it etc.
      I have an example with two dimensions here: czcams.com/video/Bdrm-bOC5Ek/video.html

  • @petermpeters
    @petermpeters Před 5 lety +26

    something happened to the sound at 8:21

    • @jayanttanwar4703
      @jayanttanwar4703 Před 4 lety +4

      You got that right Peter Peters Peterss

    • @garychap8384
      @garychap8384 Před 4 lety +2

      Don't you hate it when the lecturer goes outside for a cigarette in the middle of a lecture... but continues teaching through the window.
      Yes, we get it... your powerpoint remote works through glass! But WE CAN'T HEAR YOU! XD

    • @mikaellaine9490
      @mikaellaine9490  Před 4 lety +2

      Yes, sorry about that!

    • @BrandonSLockey
      @BrandonSLockey Před 3 lety

      @@garychap8384 LMFAO

    • @goksuceylan8844
      @goksuceylan8844 Před 3 lety +1

      Peter Peters Peterss Petersss

  • @mehedeehassan208
    @mehedeehassan208 Před 2 lety

    How do we determine which way to go? I mean ,the direction of change in weight .If we are in the left side of concave curve?

  • @FPChris
    @FPChris Před 2 lety

    No one ever says when multiple layers and multiple outputs exist when the weights get adjusted do you do numerous forward passes after each individual weight is adjusted? Or do you update ALL the weights THEN do a single new forward pass.

    • @mikaellaine9490
      @mikaellaine9490  Před 2 lety +1

      Yeah, single forward pass (during which gradients get stored, see my other videos) followed by a single backpropagation pass through the entire network, updating all weights by a bit.

    • @FPChris
      @FPChris Před 2 lety

      @@mikaellaine9490 Thanks. Much appreciated.

  • @puppergump4117
    @puppergump4117 Před 2 lety

    If I had different amounts of neurons per layer, then would the formula at 11:30 be changed to (average of the activations of the last layer) * (average of the weights of the next layer) ... * (average cost of all outputs)?

    • @TheRainHarvester
      @TheRainHarvester Před rokem

      From what i read, yes. But distributing the error canbe varied too.

  • @3r1kz
    @3r1kz Před 2 měsíci

    I don't know anything about this subject but I was understanding it until the rate of change function. Probably a stupid question but why is there a 2 in the rate of change function, as in 2(a-y). Is this 2 * (1.2 - 05)? Why the 2? I can't really see the reference to the y = x^2 but that's probably just me not understanding the basics. Maybe somebody can explain for a dummy like me.
    Wait maybe I understand my mistake, the result should be 0.4 right? So its actually 2(a-1) because otherwise multiplication goes first and you end up with 1.4?

    • @joemurray1
      @joemurray1 Před měsícem

      The derivative of x^2 (x squared) is 2x. The cost function C is the square of the difference between actual and desired output i.e. (a-y)^2. Its derivative (slope) with respect to a is 2(a-y).
      We don't use the actual cost to make the adjustment, but the slope of the cost. That always points 'downhill' to zero cost.

  • @onesun3023
    @onesun3023 Před 4 lety +1

    Where does the '-1' come from? It looks like it is in the position of y but y is 0.5. Not -1.

    • @onesun3023
      @onesun3023 Před 4 lety

      Oh. I see. the 2 was distributed to it but not to a.

  • @chrischoir3594
    @chrischoir3594 Před 3 lety +1

    "as per usual? um what is usaul?

  • @lukec5838
    @lukec5838 Před 3 lety

    Can you please tell me how did you graph that cost function? I plotted this cost function in my calculator and I am getting a different polynomial. I graphed ((x*0.8)-0.5)**2 thanks.

    • @mikaellaine9490
      @mikaellaine9490  Před 3 lety

      Hi and thank you for your question. I've used Apple's Grapher for all the plots. It should look like in the video. Your expression ((x*0.8)-0.5)**2 is correct.

  • @bubblesgrappling736
    @bubblesgrappling736 Před 4 lety

    also, im not really able to find anywhere what delta signifies here, only stuff on the delta rule

  • @semtex6412
    @semtex6412 Před 6 měsíci

    on 2:40, Mikael mentioned "...and the error therefore, is 0.5" i think he meant "and the *desired output*, therefore is 0.5"? slight erratum perhaps?

    • @semtex6412
      @semtex6412 Před 6 měsíci

      because otherwise, the cost (C) is 0.49, not 0.5

  • @dilbreenibrahim4128
    @dilbreenibrahim4128 Před 3 lety +1

    Please how can I update bias please some one answers me?

  • @maravilhasdobrasil4498

    Maybe this is a dumb question, but how you go from 2(a-y) to 2a-1? (7:17)

  • @hikmetdemir1032
    @hikmetdemir1032 Před 10 měsíci

    what if the number of neurons in the layer is more than one

  • @oposicionine4074
    @oposicionine4074 Před rokem

    There is one thing I dont understand.
    Suposse you have two inputs, for the first input the perfect value is w1=0.33
    But for the secon input, the perfect value would be w1 = 0.67.
    How would you compute the backpropagation to get the perfect value to minimize the cost function?

    • @accadia1983
      @accadia1983 Před rokem

      Run multiple experiments with different inputs and measure the outcome: if the outcome is perfect, there is no learning. How would you answer the question?

  • @coxixx
    @coxixx Před 4 lety +3

    it wasn't for dummies.it was for scientist.

  • @jameshopkins3541
    @jameshopkins3541 Před 3 lety

    what about code?

  • @rafaelramos6320
    @rafaelramos6320 Před 5 měsíci

    Hi,
    a = i * w
    1.5. * 2(a -y) = 4.5 * w - 1.5
    What happened to the y?

    • @LaurentPrat
      @LaurentPrat Před 3 měsíci

      y is given = the target value, here = 0.5. => 1.5*2(1.2-0.5) = 2.1 which equal to 4.5*0.8-1.5

  • @vudathabhavishya9629
    @vudathabhavishya9629 Před 4 měsíci

    Can anyone explain how to plot for 2(a-y),c=(a-y)2. i=1.5

  • @shameelfaraz
    @shameelfaraz Před 3 lety +1

    Suddenly, I feel depressed... around 8:20

  • @hakankosebas2085
    @hakankosebas2085 Před 3 lety

    what about 2d input

    • @mikaellaine9490
      @mikaellaine9490  Před 3 lety

      There is a video for 2d input: czcams.com/video/Bdrm-bOC5Ek/video.html

  • @theyonly7493
    @theyonly7493 Před 3 lety

    If all I want is:
    a = 0.5
    with:
    a = i · w
    then:
    w = a / i = 0.3333
    One simple division, no differential calculus, no gradient descent :-)

    • @mikaellaine9490
      @mikaellaine9490  Před 3 lety

      Brilliant. Now generalize that to any sized layer and any number of layers. I suppose you won't need bias units at all. You have just solved deep learning. Profit.

  • @user-th7gd7ge4p
    @user-th7gd7ge4p Před rokem

    it was moreless comprehensible until the "mirrored 6" character appeared with no explanation of what it was, how it was called and why it was there. so let's move on to another video on backpropgagation...

  • @bettercalldelta
    @bettercalldelta Před 2 lety

    4:36 me watching this in 8th grade: bruh

    • @b.f.9484
      @b.f.9484 Před rokem

      the word "probably" means that, there are people who don't know this (you as example).

  • @vast634
    @vast634 Před 3 lety

    7:04: 2(1.5 * w) -1 = 2(1.5 * 0.8) -1 = 1.4 not 1.5

  • @NavySturmGewehr
    @NavySturmGewehr Před 2 lety

    During editing do you not notice how much you lip smack? Makes it so hard to listen to. Otherwise, thank you, the content is helpful.

  • @thamburus7332
    @thamburus7332 Před 3 měsíci

  • @dmitrikochubei3569
    @dmitrikochubei3569 Před 4 lety

    Thank you !