Gradient Boosting In Depth Intuition- Part 1 Machine Learning

Sdílet
Vložit
  • čas přidán 4. 09. 2024

Komentáře • 157

  • @krishnaik06
    @krishnaik06  Před 4 lety +350

    Trust me I have taken 10 retakes to make this video. Please do subscribe my channel and share with everyone :) happy learning

    • @bibhupatri1811
      @bibhupatri1811 Před 4 lety +6

      Hi sir your previous video in XGboost is same like ada boost. Please make a separate video for XGboost explanation.

    • @arjundev4908
      @arjundev4908 Před 4 lety +7

      Your constant efforts to contribute to DS community gives me chills down my spine... What an amazing dedication 😊 👍 ✌

    • @krishnaik06
      @krishnaik06  Před 4 lety +10

      Yes XGboost video will uploading after gradient boosting

    • @smitsG
      @smitsG Před 4 lety +1

      hats off to ur dedication

    • @sairajesh5413
      @sairajesh5413 Před 4 lety

      Thanks allot Krish Naik

  • @bhavikdudhrejiya852
    @bhavikdudhrejiya852 Před 3 lety +27

    Excellent video.
    Below are the jotted down points from this video:
    1. We have a Data
    2. Creating Base Learner
    3. Predicting Salary from base learner
    4. Computing loss function and extract residual
    5. Adding Sequential Decision Tree
    6. Predicting residual by giving experience and salary as predictors and residual as a target
    7. Predicting Salary from base learner prediction of salary and decision tree prediction of residual
    - Salary Prediction = Base Learner Prediction + Learning Rate*Decision Tree Residual Prediction
    - Learning Rate will be in the range of 0 to 1
    8. Computing loss function and extract residual
    9. Point 5 to 9 are a iterations. Each iteration decision tree will be added sequentially and prediction the
    salary
    - Salary Prediction = Base Learner Prediction + Learning Rate*Decision Tree Residual Prediction1
    + Learning Rate*Decision Tree Residual Prediction 2
    .....................................................................................
    + Learning Rate*Decision Tree Residual Prediction...n
    10. Testing the data - Testing data will be giving to the model which have minimum residual while prediction
    in iteration

    • @sachingupta5155
      @sachingupta5155 Před 2 lety

      Thanks man for the Notes

    • @avikshitbanerjee1
      @avikshitbanerjee1 Před rokem +2

      Thanks for this. But a slight correction on step 6, as salary is never treated as an independent variable.

  • @thetensordude
    @thetensordude Před 10 měsíci +1

    For those who are learning about boosting, here's the crux.
    In boosting, we first build high bias, low variance (underfitting) models on our dataset, then we compute the error of this model with respect to the output. Now, the second model that we build should approximate the error that we have for our first model.
    second_model = first_model + (optimisation: find a model which minimises the error that the first model makes)
    This methodology works because as we keep on building the model the error get's minimised, hence the bias reduces. So, we get a robust model.
    Going a bit more in depth, instead of computating the error we compute the pseudo residual because the pseudo residual is proportional to the error, and we can minimise any loss.
    So, the model becomes,
    model_m = model_at_(m-1) + learning_rate * [derivative of the loss function with respect to model_at_(m-1)]

  • @rishabs5991
    @rishabs5991 Před 4 lety +30

    Awkward Moment when Krish estimates the average value to be 75 and it actually turns out to be 75!

  • @legiegrieve99
    @legiegrieve99 Před rokem

    You are a life saver. I am watching all of your videos to prepare for my exam. Well done you. You are a good teacher. 🌟

  • @syncreva
    @syncreva Před rokem +2

    You are literally the best teacher i ever had.. Thank you so much for this dedication sir.. Means really a lot✨✨

  • @mambomambo4363
    @mambomambo4363 Před 4 lety +4

    Hello sir, I am a college student and ML enthusiast. I have followed your videos and have recently completed Andrew Ng's course on ML. Having done that, I think I have got a broader perspective on ML and stuffs. Now am keen to crack the GSoC in the field of ML but I have no idea how to do so. Additionally, I don't even know how much knowledge I need. Going through answers on Quora didn't helped, thus, I would be quite grateful if you address my problem. Waiting to hear from you. Mucho gracias!!

    • @hritwijkamble9988
      @hritwijkamble9988 Před rokem

      what further steps you took after this in overall learning phase of ml......plz tell

  • @anishdhane1369
    @anishdhane1369 Před rokem

    Machine Learning is Difficult Names but Easy Concepts 😆 Just Kidding Thanks a lot Sir!!!

  • @baskarkevin1170
    @baskarkevin1170 Před 4 lety +3

    U r making complex
    Concepts into easy one

  • @sandipansarkar9211
    @sandipansarkar9211 Před 3 lety

    watched it again.Very important for product based companies

  • @shahbhazalam1777
    @shahbhazalam1777 Před 4 lety +5

    wonderful...!! waiting for the third part ( SVM- kernel trick ), please upload as soon as possible

  • @donbosco915
    @donbosco915 Před 4 lety +10

    Hi Krish. Love the content on your channel. Could you do a project from scratch which includes PCA, Data normalization, Feature selection, feature scaling. I did see your other projects but would love to see one that implements all of the concepts.

  • @mohittahilramani9956
    @mohittahilramani9956 Před rokem

    Sir u are a life saver what a great teacher… ur voice just fits in the mind while self learning as well

  • @xruan6582
    @xruan6582 Před 3 lety

    You save my life in this information/algorithm explosion era

  • @nareshjadhav4962
    @nareshjadhav4962 Před 4 lety +3

    Exellent krish...Now I am deadly waiting for Xgboost (favourite algorithm)

  • @Zelloss67
    @Zelloss67 Před 5 měsíci

    @krishnaik06 could you please comment
    Where is the gradient btw? As I know in real gradient boosting we teach weak-learners (r_i trees) not to predict a residual, but to predict a gradient of the loss function by y_hat_i. This gradient is later multiplied with learning rate and step size is thus obtained.
    Why to predict gradient instead of just residulas?
    1) We can use complex function with logical conditions. For example -10x if x2. Thus we punish model with negative score if y_i_hat is lower than 0.
    This is the major reason

  • @ckeong9012
    @ckeong9012 Před rokem

    no word that i can express how excellent this video is. thanks sir

  • @IamMoreno
    @IamMoreno Před 2 lety

    Simply you have the gift of transmitting knowledge, you are awesome! Please share a video about shap values

  • @priyabratamohanty3472
    @priyabratamohanty3472 Před 4 lety +2

    I think you saw my comment in previous video,there i request to upload gradient boosting.
    Thanks for uploading

  • @nehabalani7290
    @nehabalani7290 Před 3 lety

    Great job!!! really like the example used to explain what is actually happening to the input values. understanding on overall technicals is easily available on youtube channels, but this example really changes the way i look at GBM after years of using it

  • @sivareddynagireddy56
    @sivareddynagireddy56 Před 2 lety

    very thanks krish,u r telling in a simple lucid way

  • @ANUBHAVSAHAnullRA
    @ANUBHAVSAHAnullRA Před 4 lety +2

    Now this is quality content!
    sir,can u plz make videos on XGBoost like this

  • @ruthvikrajam.v4303
    @ruthvikrajam.v4303 Před 3 lety

    krish 75 is right value only man, u r perfect

  • @ManishKumar-qs1fm
    @ManishKumar-qs1fm Před 4 lety +1

    Sir, m see each and every video of yr channel even many times, plz make a video on imbalenced datasets end to end project, even u make a video on dis but u r not deal wid imbalenced data, u use a another technic, plz make one video for me,
    Awesome 👍 in word

  • @DS_AIML
    @DS_AIML Před 4 lety +1

    Great Krish. Waiting for Part 2,3 and 4

  • @Fsp01
    @Fsp01 Před 4 lety

    voice of a guy who knows his stuff

  • @kabilarasanj8889
    @kabilarasanj8889 Před 3 lety

    this is a super-simplified explanation. Thanks for this video krish

  • @stephanietorres3842
    @stephanietorres3842 Před 2 lety

    Excellent video Krish, congrats! It's really clear.

  • @sohailhosseini2266
    @sohailhosseini2266 Před 2 lety

    Thanks for the video!

  • @newbienate
    @newbienate Před 10 měsíci +1

    should the sum of all learning rates be 1? Or close to 1? Coz I believe by that way only we can prevent overfitting and still reach closest to true functional approximation value

  • @ex0day
    @ex0day Před 2 lety

    Awesome explanation Bro!!! thanks for sharing your knowledge

  • @BatBallBites
    @BatBallBites Před 4 lety +1

    Sir i am from Pakistan , Big Fan , Thanks for all data science stuff and specially for this video , waiting for other 3 parts

  • @sachinborgave8094
    @sachinborgave8094 Před 4 lety +1

    Thanks Krish......Also, please complete Deep Learning playlist.

  • @sairajesh5413
    @sairajesh5413 Před 4 lety +2

    Hey .. Superb.. dude this is really awesome..

  • @surendermohanraghav8998

    Thanks for the video I am not able to find 3rd part for classification problem.

  • @user-of1ll3dy4h
    @user-of1ll3dy4h Před 10 měsíci

    Really helpful

  • @inderaihsan2575
    @inderaihsan2575 Před 11 měsíci

    thank you very very much!

  • @phanik377
    @phanik377 Před 3 lety

    1) I think you learning rate wouldn't change. So it is just 'alpha' . Not 'alpha1' and 'alpha2' for every decision tree
    2) The trees are predicting residuals . It not necessary the residual reduce at every iteration. They may increase for some observation. For example for data point where your target is 100. The residuals has to increase

  • @pramodtare480
    @pramodtare480 Před 4 lety +2

    It is Cristal clear thanks for the video.
    Actually I want to know about membership is it included deep learning and NLP
    and what kind of content you will be sharing
    Thank you

    • @krishnaik06
      @krishnaik06  Před 4 lety +1

      U will get access to live project and materials created by me...

  • @sumitgalyan3844
    @sumitgalyan3844 Před 3 lety

    your teach awsome bro lobve from banglore

  • @vipinmanikkoth4245
    @vipinmanikkoth4245 Před 4 lety

    As always awesone...! Waiting for Part 2!!

  • @raom2127
    @raom2127 Před 2 lety

    Sir your vedios are really value added asset really good to listen,In comming vedios can you please for Topics to learn seperately for learning on ML and Deep Learning

  • @nischalsubedi9432
    @nischalsubedi9432 Před 3 lety

    good video

  • @ruthvikrajam.v4303
    @ruthvikrajam.v4303 Před 3 lety

    osm naik

  • @akokari
    @akokari Před 3 lety

    In the formulae you computed, either i should go from 0 to n where lamda0 = 1 or just add h0(x)

  • @madhureshkumar
    @madhureshkumar Před 4 lety

    nicely explained ... thnaks for the video

  • @pallavisaha3735
    @pallavisaha3735 Před 3 lety +1

    3:02
    How are you assuming for all x1,x2 the predicted y is 75 always ? Hypothesis is a function of x1,x2. How can this be a constant ?

  • @oriabnu1
    @oriabnu1 Před 4 lety +2

    Asynchronous Stochastic Gradient Descent does it work like parallel decision tree please make a video on this algorithm, no standard material available on this gradient algorithm, how can implement on image data I will thankful to you

  • @nikhilagarwal2003
    @nikhilagarwal2003 Před 4 lety +1

    Hi Krish. Thanks for making such complex techniques easier to understand. I have a query though. Can we use techniques such as Adaboost, Gradient Boost and XgBoost for Linear and Logistic Regression Models and not trees?
    If Yes, Is the Output Final Model Coefficients or Additive Models just like Trees?
    Thanks in advance.

  • @koustavdutta5317
    @koustavdutta5317 Před 4 lety +2

    sir, your video on SVM Kernel Trick regarding Non Linear Separation never came. Please try to make a video and thus complete SVM Part

  • @rog0079
    @rog0079 Před 4 lety +1

    waiting eagerly for deep nlp videos :D

  • @sandeepganage9717
    @sandeepganage9717 Před 3 lety

    2:29 75 was actually the right number :-D

  • @sandipansarkar9211
    @sandipansarkar9211 Před 3 lety

    Great Explanation Kris.Thanks

  • @vikasrana1732
    @vikasrana1732 Před 4 lety +1

    Hi Krish, Great work Man...well just want to know if you could upload "to build a data pipeline in GCP".
    Thanks

  • @baharehghanbarikondori1965

    Amazing explanation, thank you

  • @satpremsunny
    @satpremsunny Před 3 lety +1

    Hi Krish. I wanted to know, how the algorithm computes multiple learning rates (L1,L2, .... Ln) when we specify only single learning rate while initializing the GBRegressor() or GBClassfier(). We are specifying only single learning rate while initializing, right ? Please feel free to correct me if I am wrong...

  • @mattmatt245
    @mattmatt245 Před 4 lety +1

    What's your opinion about tools like Orange or KNIME ? Why do we need to learn python if we have those ?

  • @shadiyapp5552
    @shadiyapp5552 Před rokem

    Thank you♥️

  • @sandeepmutkule4644
    @sandeepmutkule4644 Před 3 lety

    Ho(x) is not included in while summing, sum(i=1,n) alpha(i) * h(i)(x). It is like this? --->
    F(x) = ho(x) + sum(i=1,n) alpha(i) * h(i)(x)

  • @kasinathrajesh52
    @kasinathrajesh52 Před 4 lety +10

    Sir, I am a 17-year old I have been taking some certificates and doing some projects so is it possible to get hired if I continue like this at this age

  • @TEJASWI-yj1gi
    @TEJASWI-yj1gi Před 4 lety +2

    Hi krish can you help me how I can make a way to learn the machine learning because I’m new this domain. I had started doing a master project in it . For an thesis, I had tried allot but couldn’t make it . Could you help on it please that will be really helpful to me.

  • @harshbordekar8564
    @harshbordekar8564 Před 2 lety

    Great work! thanks!

  • @ronaksengupta6174
    @ronaksengupta6174 Před 4 lety +1

    Thank you sir 😌

  • @oguzcan7199
    @oguzcan7199 Před 2 lety

    why the first base model creates mean of the salary? just as an example?

  • @pratikbhansali4086
    @pratikbhansali4086 Před 3 lety

    Sir like u made one complete video on optimisers try to make one video on loss functions also.

  • @datafuse32
    @datafuse32 Před 4 lety +1

    Can anybody explain why we need to learn the inner functioning and loops of various algo such as linear regression and logistics regression .. whereas we can directly call a function and apply it in python ... Plz explain

  • @samarendrapradhan5067
    @samarendrapradhan5067 Před 4 lety

    Nice understanding video

  • @skc1995
    @skc1995 Před 4 lety

    Sir, i understand your teachings and it would be helpful if you address cholesky and quasi Newton solvers and what are they in optimization along with gradient descent. Not being from statistical domain its too hard for us to understand these terms

  • @itplacementprep
    @itplacementprep Před 3 lety

    Very well explained

  • @oriabnu1
    @oriabnu1 Před 4 lety +1

    Asynchronous Stochastic Gradient Descent with Delay Compensation sir can help me how this Gradient work because it is parallel gradient algorithm

  • @_ritikulous_
    @_ritikulous_ Před 2 lety

    R1 was y - y^. How did we calculate R2? Why it's -23?

  • @vishalaaa1
    @vishalaaa1 Před 4 lety

    Excellent

  • @benjaminbentekelongau8098

    Very helpful Sir

  • @padmavathiv2429
    @padmavathiv2429 Před rokem

    Hi sir Can u pls tell me the recent machine learning algorithm for classification

  • @yukeshnepal4885
    @yukeshnepal4885 Před 4 lety

    Again, thanks with heart sir 👌👌

  • @noushanfarooqi36
    @noushanfarooqi36 Před 4 lety

    This is one of the best explanations on gradient boosting. Will you be doing a video on xgboost soon?

  • @Chkexpert
    @Chkexpert Před 3 lety +1

    Krish, that was great content. I would like to know, where exactly does the algorithm stop? In case of random forest, it is mentioned by controlling max_depth, n_samples_split, etc. What is the parameter that helps gradient boosting to stop?

  • @phaniraju0456
    @phaniraju0456 Před 3 lety

    marvellous approach :)

  • @sunnyghangas4391
    @sunnyghangas4391 Před 3 lety

    perfectly explained !!

  • @uttejreddypakanati4277

    Hi Krish, Thank you for the videos. In the example you took for Gradient Boosting, I see the target has numeric values. How does the algorithm work in case the target has categorical values (e.g. Iris dataset)? How does the first step of calculating the average of the target values happen?

  • @glaswasser
    @glaswasser Před 3 lety

    cool man nice dude you rock totally!!

  • @rajatjain4478
    @rajatjain4478 Před 4 lety

    Great Explanation!

  • @Fun-and-life438
    @Fun-and-life438 Před 4 lety +1

    Sir do you provide any certificate programs online

  • @tanvibamrotwar
    @tanvibamrotwar Před rokem

    Hi sir in generalised formula h0(x) is missing because u take range from 1 to h . Or im getting wrong

  • @ajaybandlamudi2932
    @ajaybandlamudi2932 Před 2 lety

    I have a question could you please solve it e what is the difference and similarities of Generalised Linear Models (GLMs) and Gradient Boosted
    Machines (GBMs)

  • @jadhavsourabh
    @jadhavsourabh Před 2 lety

    Sir, generally we scale all the tree with same alpha value, right???

  • @aashishdagar3307
    @aashishdagar3307 Před 3 lety

    hello sir, @6:10 decision tree predict on given features and taking R1 as a target, if R2 is -23 then it means decision tree predicts the +2, only then the R2 --> -25+2 =-23, is that so?
    and final model is h0(x)+h1(x) ........ ??

    • @mranaljadhav8259
      @mranaljadhav8259 Před 3 lety

      Same question, you got the answer? Plz let me know how to calculate R^2

  • @maheshpatil298
    @maheshpatil298 Před 3 lety

    Is it correct that the base model would any ML model eg( KNN,LR,Log Reg, SVM).?
    Is gradient boosting is kind of regularization.?

  • @ankiittalwaarin
    @ankiittalwaarin Před rokem

    I could not find Your videos about gradient boosting on classfication ..can you share the link...

  • @tadessekassu2799
    @tadessekassu2799 Před rokem

    krish n. pls can you share me how i can generate rules from models in ml

  • @SAINIVEDH
    @SAINIVEDH Před 3 lety

    Why does the residuals keep on decreasing. To my knowledge it's a regression tree, the output may be grater or lower right ?!

    • @SAINIVEDH
      @SAINIVEDH Před 3 lety

      They'll decrease as we are moving closer to real predictions by adding trees trained on previous residuals

  • @architchaudhary1791
    @architchaudhary1791 Před 4 lety

    I'm 6 year old and follow your all ml tutorial videos. Can I applied on Data science post at this age

  • @singhamrinder50
    @singhamrinder50 Před 3 lety

    Hi Krish, how would we calculate the average value when we have to predict the salary for new data because at that point of time we do not have this value?

  • @neerajpal311
    @neerajpal311 Před 4 lety

    Hello Sir please make a video on XGBoost .Thanks in advance

  • @ManuGupta13392
    @ManuGupta13392 Před 3 lety

    this R2 is the residual of the second model (i.e R1 - R1hat) or the R1hat ?

  • @alkeshkumar2227
    @alkeshkumar2227 Před 3 lety

    sir at 9:40 , i varying from 1 to n then how base model output means h0(x) ?

  • @aloksingh3440
    @aloksingh3440 Před 2 lety

    Hi krish at 7:33 u confirm it as high variance but just at training data u cannot confirm it. This can be misleading for new learners.

  • @Badshah.469
    @Badshah.469 Před 4 lety +1

    Grt video sir but why its called gradient???

  • @ajayrana4296
    @ajayrana4296 Před 3 lety

    how it will work in classifying problem

  • @shreyasb.s3819
    @shreyasb.s3819 Před 3 lety

    What is base model here? Thats also decision tree ?