Tutorial 12- Stochastic Gradient Descent vs Gradient Descent

Sdílet
Vložit
  • čas přidán 27. 07. 2019
  • Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
    Deep Learning Playlist: • Tutorial 1- Introducti...
    Data Science Projects playlist: • Generative Adversarial...
    NLP playlist: • Natural Language Proce...
    Statistics Playlist: • Population vs Sample i...
    Feature Engineering playlist: • Feature Engineering in...
    Computer Vision playlist: • OpenCV Installation | ...
    Data Science Interview Question playlist: • Complete Life Cycle of...
    You can buy my book on Finance with Machine Learning and Deep Learning from the below url
    amazon url: www.amazon.in/Hands-Python-Fi...
    🙏🙏🙏🙏🙏🙏🙏🙏
    YOU JUST NEED TO DO
    3 THINGS to support my channel
    LIKE
    SHARE
    &
    SUBSCRIBE
    TO MY CZcams CHANNEL

Komentáře • 96

  • @BalaguruGupta
    @BalaguruGupta Před 3 lety +14

    Amazing explanation Sir! You'll always be the hero for the AI Enthusiasts. Thanks a lot!

  • @nagesh866
    @nagesh866 Před 3 lety +5

    what an amazing teacher you are. Crystal clear.

  • @saurabhnigudkar6115
    @saurabhnigudkar6115 Před 4 lety +5

    Best Deep Learning playlist on youtube

  • @ravindrav1895
    @ravindrav1895 Před 2 lety +1

    whenever i am confused with some topics , i come back to this channel and watch your videos and it helps me a lot sir .Thank you sir for an amazing explanation

  • @ajithtolroy5441
    @ajithtolroy5441 Před 4 lety +2

    I saw many videos but this one is quite comprehensible and informative

  • @lakshminarasimhanvenkatakr3754

    This is excellent explanation so that anyone can understand with so much granular level of details.

  • @shashanktripathi3034
    @shashanktripathi3034 Před 3 lety +6

    Krish sir your youtube channel is just like GITA for me as one gets all the answers to life in GITA I get all my doubts cleared on your channel.
    Thank you, SIr.

    • @kartikdave659
      @kartikdave659 Před 3 lety

      after becoming member how can i get the data science material, can you please tell me?

  • @archanamaurya89
    @archanamaurya89 Před 3 lety +6

    This video is such a light bulb moment for me :D Thank you so very much!!

  • @fedisalhi6320
    @fedisalhi6320 Před 4 lety +8

    Excellent explanation, it was really helpful thank you.

  • @nitayg1326
    @nitayg1326 Před 4 lety +15

    My God! Finally am clear about GD SGD and mini batch SGD!

  • @severnsevern1445
    @severnsevern1445 Před 3 lety

    Great explanation . Very clear . Thank!

  • @VVV-wx3ui
    @VVV-wx3ui Před 4 lety +1

    Superb...simply superb. understood the concept now from the Loss function. Well don Krish.

  • @Skandawin78
    @Skandawin78 Před 4 lety

    Your vidoes are excellent reference to brush up these concepts

  • @allaboutdata2050
    @allaboutdata2050 Před 4 lety +1

    What an explaination 🧡 . Great !! Awesome !! .

  • @taranilakshmi9680
    @taranilakshmi9680 Před 4 lety

    Explained very well. Thankyou.

  • @tonyzhang2501
    @tonyzhang2501 Před 3 lety +1

    Thank you, It is clear explanation. I got it!

  • @bhavanapurohit2627
    @bhavanapurohit2627 Před 3 lety +2

    Hi, is it completely theoretical or will you code in further sessions?

  • @khuloodnasher1606
    @khuloodnasher1606 Před 4 lety

    Really this is the best video i'v seen ever explaining the concept better than famous. school

  • @gayathrijpl
    @gayathrijpl Před rokem

    such a clean way of explanation

  • @rabidub733
    @rabidub733 Před 3 měsíci

    thanks for this! great explanation

  • @gauravsingh2425
    @gauravsingh2425 Před 4 lety

    Thanks Krish !!! very nice explanation

  • @chinmaybhat9636
    @chinmaybhat9636 Před 4 lety

    Awesome @KrishNaik Sir.

  • @uttamchoudhary5229
    @uttamchoudhary5229 Před 5 lety +1

    Great video man 👍👍..Please keep it up. I am waiting for next videos

  • @vinuvarshith6412
    @vinuvarshith6412 Před rokem

    Top notch explanation!

  • @aditisrivastava7079
    @aditisrivastava7079 Před 4 lety +2

    Just wanted to ask to ask if you could also suggest some good resources online that we can read which could bring more clarity.......

  • @ArthurCor-ts2bg
    @ArthurCor-ts2bg Před 4 lety

    Krish you concise subject most meaningfully

  • @guytonedhai
    @guytonedhai Před rokem

    How are you so good at explaining 😭😭😭😭😭 Thanks a lot ♥♥♥

  • @sandipansarkar9211
    @sandipansarkar9211 Před 4 lety +1

    Thanks Krish. Good video.I want to use all this knowledge in my next batch of deep learning by ineuron

  • @Kurtmind
    @Kurtmind Před 2 lety

    Excellent explanation Sir!

  • @syedsaqlainabatool3399

    This is what i was looking for

  • @akfvc8712
    @akfvc8712 Před 3 lety

    greate video excelent effort. appreciated!!

  • @nansonspunk
    @nansonspunk Před rokem

    yes i really liked this explanation thanks

  • @rameshthamizhselvan2458

    Excellent!

  • @alsabtilaila1923
    @alsabtilaila1923 Před 3 lety

    Great one!

  • @ashwanikumar-zh1mq
    @ashwanikumar-zh1mq Před 3 lety

    Good Good clearly explained nobody can explained like this

  • @rdf1616
    @rdf1616 Před 4 lety

    good explanation! thankss

  • @koustavdutta5317
    @koustavdutta5317 Před 3 lety +2

    Hi Krish, one request to you ...like this playlist, please make long videos for the ML Playlist with the Loss Functions , Optimizers used in various ML Algorithms --> mainly in case of Classification Algorithms

  • @ting-yuhsu4229
    @ting-yuhsu4229 Před 4 lety

    You are AWESOME! :)

  • @response2u
    @response2u Před 2 lety

    Thank you, sir!

  • @aminuabdulsalami4325
    @aminuabdulsalami4325 Před 4 lety

    Great guy.

  • @praneethcj6544
    @praneethcj6544 Před 4 lety

    Perfect ..!!!

  • @RaviRanjan_ssj4
    @RaviRanjan_ssj4 Před 4 lety

    great video !!

  • @nikkitha92
    @nikkitha92 Před 4 lety +1

    Sir your videos are amazing. Can you please explain about latest methodologies such as BERT , ELMO

  • @SandeepKashyap-ek2hx
    @SandeepKashyap-ek2hx Před 2 lety

    You are a HERO sir

  • @Anand-uw2uc
    @Anand-uw2uc Před 4 lety +9

    Good Explanation! But you did not speak much about when to use SGD although you clarified better on GD and Mini Batch SGD

    • @vishaldas6346
      @vishaldas6346 Před 3 lety +1

      There is nothing much to explain about SGD when you are talking about 1 datapoint at a time while considering dataset of 1000 datapoints.

  • @sreejus8218
    @sreejus8218 Před 3 lety

    If we use a sample of output to find the loss, will we use its derivative for changing whole weight or change the weights of the respective output

  • @jiayuzhou6051
    @jiayuzhou6051 Před měsícem

    the only video that explains

  • @vishaljhaveri7565
    @vishaljhaveri7565 Před 2 lety

    Thank you sir.

  • @goodnewsdaily-tamil1990

    1000 likes for you man👏👍

  • @ruchikalalit1304
    @ruchikalalit1304 Před 4 lety +1

    have you make the videos of practical implementation of all the work if so please share the links

  • @achrafkmout9398
    @achrafkmout9398 Před 3 lety

    very good explanation

  • @vineetagarwal18
    @vineetagarwal18 Před rokem

    Great Sir

  • @siddharthachatterjee9959

    Good attempt 👍. Please record with camera on manual focus.

  • @phaneendra3700
    @phaneendra3700 Před 3 lety

    hats off man

  • @louerleseigneur4532
    @louerleseigneur4532 Před 3 lety

    Thanks buddy

  • @percyjardine5724
    @percyjardine5724 Před 3 lety

    thanks Krish

  • @AjanUnderscore
    @AjanUnderscore Před 2 lety

    Thank u sir 🙏🙏🙌🧠🐈

  • @thanicssubakar6303
    @thanicssubakar6303 Před 5 lety +1

    Nice bro

  • @sathvikambati3464
    @sathvikambati3464 Před rokem

    Thanks

  • @muhammedsahalot8683
    @muhammedsahalot8683 Před měsícem

    which have more convergence speed SGD or GD ?

  • @rababmaroc3354
    @rababmaroc3354 Před 4 lety

    thank you very much for your efforts. please how can we solve a portfolio allocation problem using this algorithm? please answer me

  • @muralimohan6974
    @muralimohan6974 Před 3 lety

    How can we take k inputs at the same time

  • @rohitsaini8480
    @rohitsaini8480 Před rokem

    Sir, please solve my problem, in my view we are doing gradient descent to find the best value of m (slop in case of linear regression, considering b = 0) so if we use all the point then we must came to know at which point the value of m is less, so why we have to use learning rate to update weight because we already know the best value.

  • @bijaynayak6473
    @bijaynayak6473 Před 4 lety +5

    Hello Sir, could you share the link for the code where you explained, these videos series are very nice with short of the period we can cover so many concepts. :)

  • @r7918
    @r7918 Před 3 lety

    I have 1 question regarding this topic. Is this concept applicable to linear regression, right?

  • @yukeshnepal4885
    @yukeshnepal4885 Před 4 lety +2

    8:58 , using GD it converge quickly and while using mini-batch SGD it follows zigzag path, How??

    • @kannanparthipan7907
      @kannanparthipan7907 Před 4 lety +1

      In case of mini batch sgd, we are considering only some points so some deviations will be there in the calculation compared to usual gradient descent where we are considering all values. Simple example GD is like total population and mini SGD is like sample population, it will never be equal and in sample population some deviation always will be there in distribution compared to total population distribution.
      We cant use GD everywhere, due to time computation factor, using mini SGD will give approximate correct result.

    • @bhargavpotluri5147
      @bhargavpotluri5147 Před 4 lety +1

      @@kannanparthipan7907 Deviation will be there in the final output or in the final converge result. Question is why do we have during the process of convergence. Also for every epoch if we consider different samples then understood that there can be zig zag results in the process of convergence. But if only one sample of k records are considered then why is that zig zag during convergence?

    • @bhargavpotluri5147
      @bhargavpotluri5147 Před 4 lety +2

      Ok now I got it. For every iteration, samples are picked at random, so is zig zag. Just gone through other artciles

  • @pareesepathak7348
    @pareesepathak7348 Před 3 lety

    can you share the paper for reference and also can you share the resources for deep learning for image processing.

  • @ankitbiswas8380
    @ankitbiswas8380 Před 2 lety

    when you mentioned SGD takes place in linear regression . I didnt understand that comment . Even in your linear regression videos for the mean square error we are having sum of squares for all data points . So how SGD got linked in linear regression ?

  • @a.sharan8876
    @a.sharan8876 Před rokem

    py:28: RuntimeWarning: overflow encountered in scalar power
    cost = (1/n)*sum([value**2 for value in(y-y_predicted)]) hey bro . ia m stuck here with this error , i could not understand the error itself, if you suggests me some solution. .... just now i started to practice a ml algorthm.

  • @_JoyshreeMozumder
    @_JoyshreeMozumder Před 3 lety

    what is resource of data point?

  • @shubhangiagrawal336
    @shubhangiagrawal336 Před 3 lety

    good video

  • @manojsalunke2842
    @manojsalunke2842 Před 4 lety

    9.28 time, you said sgd will take time to converge than gd, then which is fast , sgd or gd????

  • @abhrapuitandy3327
    @abhrapuitandy3327 Před 4 lety

    please do tell about stochastic gradient ascent also

  • @samiabidah4197
    @samiabidah4197 Před 3 lety

    please what the difference between GD and Batch GD !

  • @khushboosoni2788
    @khushboosoni2788 Před rokem

    sir can you explain me SPGD algorithm please

  • @minakshiboruah1356
    @minakshiboruah1356 Před 3 lety

    @12:02 Sir it should bemini batch stocastic g.d.

  • @jsverma143
    @jsverma143 Před 4 lety +1

    negative weights and positive weights best explained as--
    since the angle of tangent is more than 90 degree in left side of the curve so this results in -ve values and for other its less than 90 degree so it would be +ve

  • @soheljagirdar8830
    @soheljagirdar8830 Před 3 lety +1

    4:17 SGD have minimum 256 records to find error / minima you said it's 1 record at a time

    • @pramodyadav4422
      @pramodyadav4422 Před 3 lety +1

      I read few articles which says In "SGD a randomly one data point is picked from the whole data set at each iteration". 256 records which you're talking about may be Mini Batch SGD "It is also common to sample a small number of data points instead of just one point at each step and that is called “mini-batch” gradient descent."

    • @tejasvigupta07
      @tejasvigupta07 Před 3 lety

      @@pramodyadav4422 yeah ,even I have read that in SCD only one data point is selected and updated in each iteration instead of all.

  • @funpoint3966
    @funpoint3966 Před 3 měsíci

    please workout your camera issue it seems like it is set to auto focus resulting in a little disturbance.

  • @shekharkumar1902
    @shekharkumar1902 Před 4 lety

    Confusing one !

  • @atchutram9894
    @atchutram9894 Před 4 lety

    Switch the auto focus feature in your camera. It is distracting.

  • @chalapathinagavarmabhupath8432

    our videos are good but camara was bad

  • @devaryan2201
    @devaryan2201 Před 2 lety

    do change your method of teaching seems like someone has read a book and just trying to copy thatt content from ones side .....use your own ideologies for it
    :)