Machine Learning | Gradient Descent (with Mathematical Derivations)

Sdílet
Vložit
  • čas přidán 12. 09. 2024

Komentáře • 153

  • @RanjiRaj18
    @RanjiRaj18  Před 3 lety +5

    For notes👉 github.com/ranjiGT/ML-latex-amendments

  • @gabelster3729
    @gabelster3729 Před rokem +5

    You just helped me understand hundreds of web pages that talked about topics with no order. Thank you

  • @sudiptodas6272
    @sudiptodas6272 Před 3 lety +17

    What is great about this particular video is these concepts are explained well in many places like scattered dots , you connected the dots to paint the whole picture . an example for gradient descent included - very helpful .

    • @RanjiRaj18
      @RanjiRaj18  Před 3 lety +1

      Thank You for your valuable feedback 😊

  • @donaldngwira
    @donaldngwira Před 2 lety +6

    You are such a great teacher. Concepts are clearly explained beginning with the basics and slowly easing into the most advanced level. Thank you

  • @rajapal9736
    @rajapal9736 Před 3 lety +4

    Hi, Your video is helpful for beginners to understand the concept. One suggestion: In the very beginning of the video when you write the equation of your predicted line remember to mark it as y(cap) = mx(i) + c. It is not y(i) which is the actual data point.

  • @yashdhawade5341
    @yashdhawade5341 Před 8 měsíci +1

    The best and clear explanation I've ever listened about Gradient Descent. Keep up the good work!🙌

  • @subramaniarumugam6902
    @subramaniarumugam6902 Před 2 lety +3

    My god you are perfect I think your work should reach more audience your best and clear than the renowned ML yputubers. Applause Ranji

  • @sumanmondal5276
    @sumanmondal5276 Před 4 lety +4

    Your hard work made the concept very easy to grasp. Kudos.......

  • @amarnammilton
    @amarnammilton Před 10 měsíci +1

    its a very good description, The way you teach is humble and appreciatable.

  • @Sagar_Tachtode_777
    @Sagar_Tachtode_777 Před 2 lety +2

    Everything is so easy on this channel, great work Man!

  • @swaroopthomare7237
    @swaroopthomare7237 Před rokem +2

    Hey thank you so much for this content since I started studying regression using your videos , I became huge fan of yours

    • @RanjiRaj18
      @RanjiRaj18  Před rokem +1

      Thank you for your comment. Glad you like it ;)

  • @theysigankumar1671
    @theysigankumar1671 Před 3 lety +2

    Sir thank you very much, this has been so helpful since my course will only get tougher from here onwards and u helped me understand the basics

  • @manishjain9703
    @manishjain9703 Před rokem +1

    I am a beginner and as a beginner I was struggling understand the gradient decent concept. I have seen many videos on gradient decent however all of them skipped explaining the derivative part however you explained it very well both (total and partial) with solution . Thanks!

  • @vishaldas6346
    @vishaldas6346 Před 4 lety +1

    Man, you've won my heart, you kept it so simple, best way of explaining Gradient Descent. Can you please help me in using learning rate in the equation and number of steps used in gradient descent with an example.

  • @shaun2201
    @shaun2201 Před 3 lety +2

    Hi @ranjiraj , at 21:37 u have given a wrong explanation in the partial derivative w.r.t C...
    d/dC (-C) will be -1 ,then why are you treating it as constant whereas in d/dc mx should be taken as constant.

  • @aryandeshpande1241
    @aryandeshpande1241 Před 2 lety +1

    this might be the most underrated explanation on youtube

  • @vedanthbaliga7686
    @vedanthbaliga7686 Před 3 lety +12

    This is what I pay my internet bill for! Thanks a lot!

  • @kvv6671
    @kvv6671 Před 9 měsíci +1

    When my ML teacher teaching this , I felt I am learning some rocket science ,but you are teaching it felt very easy , thank you Sir😊

  • @chinmaysrivastava3212
    @chinmaysrivastava3212 Před 6 měsíci +1

    I was unable to understand this topic tried many videos but this was the most useful video thankss

  • @tusharsub1000
    @tusharsub1000 Před 3 lety +2

    As far as I know, gradient descent doesn't talk about solving 'm' and 'c' directly by putting them into two equations like the way you did it here..Because sometime the expression that we get after differentiation becomes so complex especially with logistic regression, random forest and other complex model in deep learning that solving 'm' and 'c' directly becomes extremely tricky and time complexity becomes very high...So Gradient descent talks about solving 'm' and 'c' by some trial and test method. Starting with some dummy value of 'm' and 'c' and putting those values in the equation of differentiation and check if the value of differentiation(say D0) for those 'm' and 'c' becomes 0 or close to 0. If not, then subtract that differentiation value(D0) from the old value of 'm' and 'c' and get a new 'm' and 'c' and again check if the value of differentiation (say D1) for those new 'm' and 'c' comes close to 0. Continue like this till you find that old 'm' - value of differentiation is very very close to 0. That 'm' becomes your actual 'm' and similarly the same thing to be done for 'c'.

    • @RanjiRaj18
      @RanjiRaj18  Před 3 lety

      This video was an intuitive way for understanding gradient descent for beginners. Anyways appreciate your time to quote about your understanding of GD.

    • @ArunKumar-yb2jn
      @ArunKumar-yb2jn Před 2 lety

      @@RanjiRaj18 Good explanation. I think you have shown derivation for Ordinary Least Squares method. As far as Machine Learning is concerned it has to be slightly adapted.

    • @maxpatrickoliviermorin2489
      @maxpatrickoliviermorin2489 Před rokem

      He solved m and c in this case because it wasn't a very complex example. Only one independent variable. In a multiple linear regression it would have been much more complex.

  • @alhassanturay7233
    @alhassanturay7233 Před 9 měsíci +1

    Truest your the best. You solve my long time machine learning challenge.

  • @jayaprakashs4412
    @jayaprakashs4412 Před 5 měsíci

    Very good explanation. It would've been good if you could've explained the usage of learning rate usage to find a minimum point.

  • @classy_Girl8920
    @classy_Girl8920 Před rokem +1

    perfect video for core concept understanding , amazing.. I love the explanation.. thankyou so much

  • @maneetsaluja
    @maneetsaluja Před rokem +1

    Great explanation with an example.
    This is the way to explain such concepts.

    • @RanjiRaj18
      @RanjiRaj18  Před rokem

      Thank you for the comment. Happy Learning!

  • @vigneshwar2897
    @vigneshwar2897 Před 6 měsíci +1

    will the sign ( direction ) for calculating m and b at last change, from addition to subtraction if we take "y-pred - y" instead of "y - y-pred" lke you have done in cost function ? i saw at few articles where this was mentioned but it was not clear.

  • @lingadevaruhp5576
    @lingadevaruhp5576 Před 6 měsíci +1

    Really amazing, thank you so much sir, keep rocking

  • @sundar8147
    @sundar8147 Před rokem +2

    Thanks for the clear explanation sir

  • @trendhindifacts
    @trendhindifacts Před 5 měsíci +1

    Well explained bro ❤ just bring another video for statistics and linear algebra 🎉

  • @BADURELGADIR-dd2ck
    @BADURELGADIR-dd2ck Před 2 měsíci +1

    simple and useful lecture.. thanks

  • @sonnyarulanandam
    @sonnyarulanandam Před 3 měsíci +1

    Very good explanation of gradient descent

  • @MrAmarSindol
    @MrAmarSindol Před 2 lety +1

    killer explanation !! amazingly amazing !! thank you bro

  • @bhamidimaharshi
    @bhamidimaharshi Před měsícem

    explanation is simply awesome.....

  • @aaryan3461
    @aaryan3461 Před 2 lety +1

    Great video man. Loved it.

  • @user-dd7el9dp3d
    @user-dd7el9dp3d Před měsícem

    At 20.37, you discarded n/2. I would like to know why?

  • @anushadevi4937
    @anushadevi4937 Před 3 lety +1

    Thank you so much, I got a clear picture of the topic now.

  • @danielsehnoutek2016
    @danielsehnoutek2016 Před 3 měsíci

    Absolutely the best explanation

    • @danielsehnoutek2016
      @danielsehnoutek2016 Před 3 měsíci

      If I got it your last example is analytical solution, but it couldn't been done everytime, then we use iterative solution with alpha learning rate?

  • @salmansayyad4522
    @salmansayyad4522 Před 3 lety +1

    Thanks a lot sir! It was really helpful. Excellent explaination.

  • @Venomus658
    @Venomus658 Před 3 lety +3

    Thank you! Wish me luck on exam about it!

  • @Rambabukatta-ox6tc
    @Rambabukatta-ox6tc Před 4 měsíci +1

    very nicely explained Bro

  • @vasachisenjubean5944
    @vasachisenjubean5944 Před 3 lety +1

    you earned a subscriber my man

  • @Nudaykumar
    @Nudaykumar Před 4 lety

    Hi one question here:
    First derivative: xsquare =2x
    Second derivative = 2 ( replaced in same location)
    Third derivative = 0
    While applying same in mean square error formula
    First derivative= I understood square to 2/n(•••)
    Second derivative: with respective to slope
    It should be
    2/n I=1 to n (yi -xi -c) here I applied second derivative replacing.
    Since -mxi converts to -xi.
    But In ur explanation instead of replacing, you brought second derivative to starting as below:
    2/n I= 1 to n -xi(yi-mxi-c)
    In the same way for intercept.
    One more
    At the end , what happened to 2/n?
    Please correct me if I am wrong.

    • @RanjiRaj18
      @RanjiRaj18  Před 4 lety +1

      2/n Σi=1 to n -xi(yi-mxi-c) this comes from chain rule watch this part carefully again in the video, this (y-mxi-c) is the result of (yi-mxi-c)^2 and now since we want to differentiate wrt to slope m again you take the derivative now you treat the y and c as constants and what's left is -mxi so you get -xi that's what you get and you multiply with this(y-mxi-c). 2/n is a constant say for any number, n=5; 2/5 =constant you eventually equate it to zero so it vanishes away. Hope now you understand!

  • @manasagowrikottur8242
    @manasagowrikottur8242 Před 4 lety +1

    Thanks for this tutorial sir..made it very easy and simple.

  • @pursuitofgrowthwithtr
    @pursuitofgrowthwithtr Před 4 lety +1

    your videos are really nice, good content and presentation...keep it up sir.

  • @umermehboob5630
    @umermehboob5630 Před 3 lety

    That is a explanation. I have one question, where would the learning rate be actually used in computation. Like in your numerical example. We found the outputs and calculated corresponding m and c. How does the learning rate is catered. Secondly, when we multiply learning rate with derivative, what does it gives us ?

  • @dheerajverma189
    @dheerajverma189 Před 3 měsíci

    Sir linear regression is not used for classification as you said in satrting of video while explaining

  • @1234wellwell
    @1234wellwell Před 3 lety +1

    Thanks so much for the video. It helped me a lot.

  • @sarasijbasumallick4036
    @sarasijbasumallick4036 Před 2 lety +1

    can you please tell me why the curve is much sharp when you draw the graph with respect to j and c ? please tell

  • @sanusimuhammad7466
    @sanusimuhammad7466 Před 10 měsíci

    i have this video over and over again, it the most satisfying video i have seen, in as much as gradient decent is concer, but i have questions, 1 what happen to the 2 tha became the multiple of the function as chain rules implies, then what happen to the no in the cost function. i know its mean squared errored thing. in my small assumption either of the values cant be thrown away just like that mathematically. please help with explanation.

  • @SrikantBhusan
    @SrikantBhusan Před 5 měsíci

    between the timestamp 21:39 to 21:45 you told that partial derivative of y(i)-mx-c with respect to c is 0 so only take minus sign which is wrong it will be -1 because of here c is not constant.

  • @anarkaliprabhakar6640
    @anarkaliprabhakar6640 Před rokem +1

    sir u explained so well

  • @RajSingh-ik3og
    @RajSingh-ik3og Před 2 měsíci +1

    great explainnation

  • @patelraj3140
    @patelraj3140 Před 4 lety +1

    Thank you so much sir for such a perfect explanation....🙏🙏👏👏👏

  • @Shivendra7277
    @Shivendra7277 Před 3 lety +1

    Thankyou sir ,
    Make more videos on machine learning concepts .

  • @testenma5155
    @testenma5155 Před 4 lety

    Hi Ranji Sir, i have a doubt. In 17:00 you have mentioned that we need to take derivative because we have two variables. And mentioned variables as x and c. But I think you were supposed to say m and c. later in 17:27 you mentioning about two parameters m and c. Please verify whether it correct or not. If i have pointed out wrong please apologize me.

    • @RanjiRaj18
      @RanjiRaj18  Před 4 lety

      Yes you are correct we have to take derivative wrt m and c

  • @helloworld2740
    @helloworld2740 Před 2 lety +1

    really nice approch to teach
    thank you sirji

  • @sherifbadawy8188
    @sherifbadawy8188 Před 2 lety +1

    one of the best

  • @alirezasoleimani2524
    @alirezasoleimani2524 Před 5 dny

    very nice explanation

  • @OpeLeke
    @OpeLeke Před 2 lety +1

    excellent video

  • @nileshpandey5724
    @nileshpandey5724 Před 2 lety +1

    thank you so much sir

  • @sabeenao.m7388
    @sabeenao.m7388 Před 10 dny +1

    Thankyou sir 😊

  • @mahmoodapurbo5537
    @mahmoodapurbo5537 Před 7 měsíci +2

    Thanks bro.

  • @ramnarayan3323
    @ramnarayan3323 Před 3 lety +1

    Thanks ...very well explained

  • @suhasrewatkar9001
    @suhasrewatkar9001 Před měsícem

    Best explanation sir

  • @nightsky5037
    @nightsky5037 Před rokem

    why do we set the derivative equal to 0? i mean the gradient at the minima might not be equal to zero for all curves

  • @praveenkumar-nh5qs
    @praveenkumar-nh5qs Před 4 lety +1

    Nicely explained.

  • @aerogrampur
    @aerogrampur Před 3 lety +1

    keep up the good work !

  • @nchoreanthony4294
    @nchoreanthony4294 Před 10 dny

    what about when someone is working with the learning rate?

  • @debrajnath6031
    @debrajnath6031 Před 3 lety +1

    The explanation of mathematical formula is absolutely fantastic. The explanation was about with single feature. But if we have multiple feature, what to be changed in the equation? Can you please let us know that. Thanks very much, and we will love to see this kind of videos shortly.

    • @RanjiRaj18
      @RanjiRaj18  Před 3 lety

      In case of multiple features or weights we have to conisder them individually by taking the partial derivtaive. This video is just a general idea of gradient descent. Hope it answers your question.

  • @shivammodi1105
    @shivammodi1105 Před 3 lety +1

    Lovely explanation

  • @mihirnaik3383
    @mihirnaik3383 Před 2 lety +1

    Thanks Buddy :)

  • @fpl8648
    @fpl8648 Před 3 lety +1

    thank you!!!

    • @fpl8648
      @fpl8648 Před 3 lety

      It was very helpful, in writing a thesis. Could you also indicate some bibliography for citations

  • @suryakrishna760
    @suryakrishna760 Před 2 lety

    What should i do if i am to apply learning rate of some value?

  • @mariawilson6807
    @mariawilson6807 Před 4 lety +2

    Sir my maths is quite weak i want to start my career in data science i know that i can make my math strong but how should i start to learn maths for data science

    • @RanjiRaj18
      @RanjiRaj18  Před 4 lety +1

      Hello Maria, you can refer to websites like www.mathsisfun.com/ to learn the basics. Hope it helps!

    • @mariawilson6807
      @mariawilson6807 Před 4 lety

      @@RanjiRaj18 thanks sir

    • @mariawilson6807
      @mariawilson6807 Před 4 lety

      @@RanjiRaj18 Sir its very low level mathematics I am in sybsc it

  • @sudhansumtripathy
    @sudhansumtripathy Před rokem

    hi, sir do you have the python code using tensor flow or do you have any recordings of ML using TF

  • @ayushsingh-qn8sb
    @ayushsingh-qn8sb Před 3 lety +1

    great explaination

  • @AdityaSingh-lf7oe
    @AdityaSingh-lf7oe Před 4 lety

    Hi Ranji sir, I wanted to ask that if there are if our line is of the form M1*(feature1) + M2*(feature2).... Mn*(feature n) + c, do we have to follow same steps and calculate dJ/dm for all M1, M2...Mn?

  • @chaithanyack
    @chaithanyack Před 3 měsíci

    What type is it batch gradient?

  • @dineshlogu9368
    @dineshlogu9368 Před 3 lety

    can you please explain me why we are squaring the step at 4.36 . everything is clear to me except this one squaring step I cant able to understand..

  • @mayank265memories
    @mayank265memories Před 3 lety

    Amazing lecture. x^n, will not have a 3rd order derivate to be 0, it will be n+1 order derivate.

  • @RahulTiwari-oe1ww
    @RahulTiwari-oe1ww Před 2 lety +1

    Well explained

  • @aqharinasrin7002
    @aqharinasrin7002 Před rokem

    dear sir,
    I still cannot connected what is the purpose we do m = m - lambda * dJ/dm and c = c - lambda * dJ/dC

    • @leninfonseca7129
      @leninfonseca7129 Před rokem

      Yess .. exactly....plz explain the proof of these 2 equations

  • @sirajmotaung6930
    @sirajmotaung6930 Před 3 lety

    Thank you so much..quick question, when/how do we use the learning rate in this regard?

    • @RanjiRaj18
      @RanjiRaj18  Před 3 lety +1

      If I understood your question correctly then:
      When? learning rate is made use for convergence, it should not be neither too large nor too low just optimal, so that your traning process is complete.
      How? You can use learning rate schedule or can use optimizers like ADAM.

    • @sirajmotaung6930
      @sirajmotaung6930 Před 3 lety

      @@RanjiRaj18 Yes, Alright thank you so much. Your vid was really helpful.

  • @suryatej839
    @suryatej839 Před 2 lety

    is it a sweet, in the middle of a hyperplane?

  • @OpeLeke
    @OpeLeke Před 2 lety

    can this method work for an equation with multiple slopes?

  • @apoorva3635
    @apoorva3635 Před 2 lety

    Why do we need partial derivative when we have the total derivative?

    • @RanjiRaj18
      @RanjiRaj18  Před 2 lety

      When there are relatively larger coefficients in your model, taking total derivate would be a diffuícult task and also to estimate the optimal parameter. Partial derivatives reduces the workload by keeping
      one parameter as constant and determine the other.

  • @pavankumar8673
    @pavankumar8673 Před 2 lety

    Linear regression for classfication at 0:20???

  • @dineshlogu9368
    @dineshlogu9368 Před 3 lety

    Thanks you so much but I have small clarification regarding differentiate. why we are differentiate with respect to m , c & why we should not differentiate with respect to x to find out the value of y..

    • @RanjiRaj18
      @RanjiRaj18  Před 3 lety +1

      Because m, c are the weights that we want to determine which will give the best equation for curve fitting.

    • @dineshlogu9368
      @dineshlogu9368 Před 3 lety

      @@RanjiRaj18 thanks you so much for spending time to respond to my comment..

  • @testenma5155
    @testenma5155 Před 4 lety +1

    How did the 2/n gone from equation when dJ/dm and dj/dc was assigbed to 0

    • @RanjiRaj18
      @RanjiRaj18  Před 4 lety +2

      2/n is a constant say you take n =5 so it becomes 2/5 so it's derivative is zero

    • @testenma5155
      @testenma5155 Před 4 lety

      @@RanjiRaj18 Thank you Ranji

  • @rameshthamizhselvan2458
    @rameshthamizhselvan2458 Před 4 lety +1

    Excellent...

  • @iramarshad700
    @iramarshad700 Před 3 lety

    So gradient descent is our cost function to calculate the error

  • @mohamedelbatoty
    @mohamedelbatoty Před 3 lety +1

    Thanks Bro :)

  • @sandhu6355
    @sandhu6355 Před rokem

    bro please ans this question why we are taking summation of c in one equation and not in other ---> one is 5C why

  • @divyamohan6113
    @divyamohan6113 Před 4 lety +2

    thanks a lot..... :)

  • @viddeshk8020
    @viddeshk8020 Před 2 lety +1

    🙂👍 nice like that

  • @mariawilson6807
    @mariawilson6807 Před 4 lety

    Which level of maths is required 11th and 12th or degree level mathematics ?

    • @RanjiRaj18
      @RanjiRaj18  Před 4 lety +2

      Personally both Derivatives, differential equations, Matrices and vector concepts

    • @mariawilson6807
      @mariawilson6807 Před 4 lety +1

      @@RanjiRaj18 thanks

  • @abhiaaron1715
    @abhiaaron1715 Před rokem

    why at the end multiply the 2 nd equation with 5

    • @RanjiRaj18
      @RanjiRaj18  Před rokem

      To make equation balance on both sides for cancellation. Those are basic algebraic rules.

  • @shafiqahmad9057
    @shafiqahmad9057 Před 3 lety

    Sir can you recommend a book for machine learning with mathematical background please

    • @RanjiRaj18
      @RanjiRaj18  Před 3 lety

      You can refer the book by `Tom Mitchell`

    • @shafiqahmad9057
      @shafiqahmad9057 Před 3 lety

      @@RanjiRaj18 sir please what is book name and if you share pdf link it will be better

    • @RanjiRaj18
      @RanjiRaj18  Před 3 lety

      @@shafiqahmad9057 You can check on google it is open source

    • @shafiqahmad9057
      @shafiqahmad9057 Před 3 lety

      @@RanjiRaj18 thank you for very fast respomse

  • @mohdhashim2321
    @mohdhashim2321 Před 4 lety

    sir explane krne ke baad thoda side ho jaya kijiye screenshot lena rehta hai

  • @laodrofotic7713
    @laodrofotic7713 Před 3 lety

    J = 1/(2*m) * sum (h(x)-y)^2. being h(x) the hipotesis and y the accurate value... at 3:37 you got them mixed up right? damn man.. no wonder people get confused

  • @someshh7263
    @someshh7263 Před 2 lety

    Sorry bro, Not clear what you saying..