Maths behind XGBoost|XGBoost algorithm explained with Data Step by Step

Sdílet
Vložit
  • čas přidán 4. 09. 2024

Komentáře • 173

  • @prateeksachdeva1611
    @prateeksachdeva1611 Před rokem +4

    This channel has become one of my favorite platforms to learn ml, owing to the crisp explanation by Aman.

  • @cenxuneff
    @cenxuneff Před 10 dny

    Excellent explanation

  • @oluwafemiolasupo4018
    @oluwafemiolasupo4018 Před měsícem

    Nice one here. Thank you for the simplicity employed in explaining the core concepts.

  • @ahmedidris305
    @ahmedidris305 Před rokem +1

    At 10:27 I don't understand why the similarity score after the split is affected by the change of the lambda value before the split "Why the similarity score after the split will go down?". As I understood from the video, the split rule has nothing to do with the lambda value, therefore if lambda value changed, the split remains the same. the only thing changes is the gain, when the lambda value goes up, then the smilarity score before the split decreases and the gain increases because the deducted value (similarity score before the split) decreases when lambda value gets higher.

  • @nikhilpawar7876
    @nikhilpawar7876 Před 2 lety +1

    Looking at the content & no. Of subscribers. Highely underrated

  • @abiramimuthu6199
    @abiramimuthu6199 Před 4 lety +8

    Thanks a lot aman. Great video, Teaching is an art and you are doing justice to that every time by breaking down the concept to little steps and explaining it in a way it reaches everyone. keep up your good work.........i am expecting more videos in your NLP playlist

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety +2

      Thanks a ton Abirami. Hope you and your family are staying safe and good.

  • @chdoculus
    @chdoculus Před rokem

    listen to this video 3 times.. lot of insights. Thank you.

  • @animeshbagchi7881
    @animeshbagchi7881 Před 3 lety

    one of the best explanation on complex intuition of XG Boost.....

  • @asafjerbi1867
    @asafjerbi1867 Před 2 lety +1

    Hi,
    excellent explanation but I have some points which are not clear to me yet.
    1. how you choose the criteria to split the XGBoost tree by? for instance, you chose 'age

  • @nikhilpawar7876
    @nikhilpawar7876 Před 2 lety

    Definitely best & most understandable explanation of XGB🔥

  • @Krishna-pm8ty
    @Krishna-pm8ty Před 2 lety

    Very NIce Explanation!

    • @UnfoldDataScience
      @UnfoldDataScience  Před 2 lety

      Thanks Krishna.

    • @Krishna-pm8ty
      @Krishna-pm8ty Před 2 lety

      @@UnfoldDataScience Simplyifying concept with out loosing the complexity. One of the best explanation in youtube. Your channel really deserves more visibility. All the very best Aman.

  • @santoshvjadhav
    @santoshvjadhav Před 10 měsíci

    Gr8 video Sir. You have explained it clearly and in a very simple way. Thanks a lot 🙏

  • @maruthiprasad8184
    @maruthiprasad8184 Před 8 měsíci

    Superb simple explanation, Thank you very much

  • @user-px3ux9we6e
    @user-px3ux9we6e Před 8 měsíci

    Thanks a lot for this excellent video! I am still curious about how xgboost can achieve parallelization and how it handles missing values as you mentioned before. Looking forward to your new videos!

  • @sachin29596
    @sachin29596 Před 2 lety +3

    Sir, In formula new prediction=old prediction + Learning rate * output. I didn't understand how to get the output value as 6 for the second record. Could you explain once again.

    • @srk5702
      @srk5702 Před 7 měsíci

      formula = sum of residuals/no. of residuals

  • @ruchitagarg4871
    @ruchitagarg4871 Před 3 lety

    Very nicely explained, Thanks Sir. One of the best videos I have seen on CZcams.

  • @samarkhan2509
    @samarkhan2509 Před 3 lety

    very nice video.brief..concise..to the point..agree with others..probably the best explanation so far on youtube .way to go bro

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety

      Your comments are my motivation Samar. Thanks for motivating.

  • @killeraudiofile8094
    @killeraudiofile8094 Před 3 lety +1

    Thanks a lot for this. Very helpful for me as I am brushing up on ML theory for interviewing. Awesome work!

  • @sachink9102
    @sachink9102 Před 7 měsíci +1

    Q1. how to interpret Similarity Score.
    Q2. what is meaning of High Similarity Score and Low Similarity Score

  • @miroslavstimac4384
    @miroslavstimac4384 Před 3 lety +1

    Excellent explanation.

  • @sandipansarkar9211
    @sandipansarkar9211 Před 2 lety

    finished watching

  • @sadhnarai8757
    @sadhnarai8757 Před 2 lety

    Very good Aman

  • @vivekkumaryadav8802
    @vivekkumaryadav8802 Před 2 lety

    YOU ARE TRUE KNOWLEDGE

  • @subhz1
    @subhz1 Před 3 lety +2

    Nice explanation!!!!
    Can you please make a video on XG Boost, Gradient boost where the dependent variable is binary/categorical in nature say Good/Bad(0,1)

  • @Madhuram_Qualityoflife

    I like the way u explain complex concepts in simple way. Thanks

  • @preranatiwary7690
    @preranatiwary7690 Před 4 lety +1

    Hey good one again! Continue your good work.. Thanks

  • @datadriven597
    @datadriven597 Před 2 lety

    Awesome indepth explanantion, keep up the good work man!

  • @babusivaprakasam9846
    @babusivaprakasam9846 Před 3 lety

    Straight to the point. Thanks

  • @ranajaydas8906
    @ranajaydas8906 Před 3 lety +6

    Sir, can you please tell why you didnt square the SR in12:08.
    And can you tell how the output at 14:02 is 6?
    What does the output actually mean?

    • @avneshdarsh9880
      @avneshdarsh9880 Před 3 lety +4

      output value calculated as the average of residuals in our case (4+8)/2=6

    • @akashkewar
      @akashkewar Před 3 lety +1

      1) We square the sum of residuals when we compute similarity score and not when we make a prediction.
      2) As we are making a prediction, and assuming lambda is 0, prediction is just an average of all the values (residuals) in a particular leaf.
      3) output means residuals, we predict the residuals (because it is a boosting algorithm) such that the weighted sum of all the residuals is close to the target variable as much as possible (final prediction by our model).

    • @utkarshsaboo45
      @utkarshsaboo45 Před 2 lety

      @@akashkewar Are you sure we "square the sum" and not "sum the square"? The "square of sum" in the video doesn't make sense!

    • @avinashajmera2775
      @avinashajmera2775 Před rokem

      Hi aman
      please clear this

  • @HrisavBhowmick
    @HrisavBhowmick Před 3 lety +2

    12:01 why not square of sum of residuals as u said in the formula?

  • @rafibasha4145
    @rafibasha4145 Před rokem

    Hi Aman,thanks for the video,please explain how lambda controls overfitting

  • @SunilKumar-mz6kr
    @SunilKumar-mz6kr Před 3 lety

    Great explanation

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety +1

      Glad it was helpful Sunil. You're very welcome Goundo. If possible, Please share the link within data science groups. Thanks again.

  • @Vipulghadi
    @Vipulghadi Před rokem

    Sir you take only one feature for prediction,what if data have more than 1 feature,on which crieteria ,the model select the feature,Is information gain like approach is use or any other approach......
    Please explain sir

  • @parvsharma8767
    @parvsharma8767 Před 3 lety

    Thanks a lot brother..god bless u for ur information

  • @omarsalam7586
    @omarsalam7586 Před 11 měsíci

    thank you
    could you explain how to do feature important using XGboost

  • @logeshr4923
    @logeshr4923 Před měsícem

    can u do one for xgboost classification

  • @chdoculus
    @chdoculus Před rokem

    one question- what is the output in the last formula of new prediction. which output it is?

  • @pradeeppaladi8513
    @pradeeppaladi8513 Před rokem

    Thanks a lot for the lecture. Can you please clarify as to what happens in case of a classification problem? I mean what about the residuals in case of a classification problem as there will no residuals in them. How do we interpret these learnings for a classification problem?

  • @ahteshaikh1193
    @ahteshaikh1193 Před rokem

    Thanks for the excellent work!!

  • @JoshDenesly
    @JoshDenesly Před 4 lety +2

    This the best vedio in XGboost.

  • @surajprusty6904
    @surajprusty6904 Před 2 lety

    If we take mean as the criteria then sum of the residual will always be zero if values are taken as it is(with signs)

  • @akshatagrawal6701
    @akshatagrawal6701 Před rokem

    Dear Aman ji , One question please ... SS value is SR suqare but when you are calculating for the 11 value you do only sum of residual but not doing their square so please explain how it come for 6 . if do square of SR then value can be different

    • @UnfoldDataScience
      @UnfoldDataScience  Před rokem

      I will check - I may have possibly made mistake, did u check previous comments?

    • @akshatagrawal6701
      @akshatagrawal6701 Před rokem

      @@UnfoldDataScience Thanks to Aman Ji for reading your viewers' comments and respecting their doubts. I think in one comment you gave the full paper link and one more link for more detail so I will check from there... thanks

  • @sumitkumardash119
    @sumitkumardash119 Před 3 lety +3

    can you just describe the loss function for it ?

  • @ambarkumar7805
    @ambarkumar7805 Před 2 lety

    Is the same procedure for classification?

  • @ganeshkharad
    @ganeshkharad Před 11 měsíci

    too good...!!

  • @debojitmandal8670
    @debojitmandal8670 Před rokem

    Hello sir i dint follow the concept of how it's handling the outlier as you said it handles the outlier but u have not explained how as lamda increases the similarity score decreases but how is it imaoacting or taking care of the outliers i dint follow it as i couldn't understand the relationship bw them
    Second let's say a new data comes i.e 11 so it goes to the branch greater then 10 and again will a new similarity score be count d bcs u have a 3rd data i.e 11
    So (4+8+11)^2/3+0

  • @pacsSaanihaamariyam
    @pacsSaanihaamariyam Před rokem

    In the end , the new prediction value is subtracted with iq to find the new residue value when new predictions are done, why is the residue value calculated only for 34 and why not 20 amd 38

  • @tempura_edward4330
    @tempura_edward4330 Před 3 lety

    Very clear ! Thank you !✨🙏

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety +1

      You’re welcome 😊. Please share my videos in various data science groups you are part of, that will motivate me to create more content :)

  • @cgqqqq
    @cgqqqq Před 3 lety +1

    you are a god...

  • @dhineshmathiyalagan6415

    Very informative. Thanks for explaining the concept such that it is understood easily. I just to want to understand a effect of outlier on the base value(Model 0). Since mean value(which is high in the presence of outlier) is considered initially to calculate the residuals and for prediction, wouldn't it have greater impact ?. Please share your insights.

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety

      Yes, exactly Dhinesh, there will be outlier impact hence better to take care of it before starting training.

  • @atomicbreath4360
    @atomicbreath4360 Před 3 lety

    Sir what exactly is difference between base model trees created in gradient boosting and xgboost.? Do gradient boosting also use this above formula which you have shown in the video

  • @awanishkumar6308
    @awanishkumar6308 Před 3 lety +1

    Can we apply L1 and K2 regularization technique to any algorithms whether its Linear regression, xgboost, gboost, random forest or etc?

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety

      Not directly, there are different regularization parameters we can tune in various algo.

  • @shanmukhchandrayama3903

    Sir, Can you please how this xgboost works for logistic regression.

  • @JoshDenesly
    @JoshDenesly Před 4 lety

    Please make vedio on "Pipeline" of building model and how it is implement in Production

  • @akhilnooney534
    @akhilnooney534 Před 3 lety +1

    Do we calculate IG and Entropy for splitting criteria?

  • @mayanksriv00
    @mayanksriv00 Před 3 lety

    sir, please do cover light GBM and is advantage over XGboost

  • @nishidutta3484
    @nishidutta3484 Před 3 lety +1

    Hey Aman, you talked about missing value treatment in XG boost in your previous video..how does XG boost treat missing values?

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety +3

      HI NIshi, Sorry for late reply. That will be little long explanation. Please check below link for understanding more:
      datascience.stackexchange.com/questions/15305/how-does-xgboost-learn-what-are-the-inputs-for-missing-values

  • @letsplay0711
    @letsplay0711 Před 2 lety

    14:22 , I think output is (12) square, 144/2+0, 72. Please Correct me if wrong...

  • @drsivalavishnumurthy34

    Sir nice video sir.pls make a video of the dependent variable is categorical that is yes or no .

  • @jatin7836
    @jatin7836 Před 3 lety +1

    very explanatory video, great work bro, I just need to ask one thing, that output thing at the end, how we got 6 as the output because we divide (4+8)^2 / 2+0 = 72 ,if we do not square this we get 6, but the formula is with square right?, so how we got that 6 as the output? it must be something else(may be72 i think), please explain.

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety

      Thanks jatin, will check that. Thanks for pointing out.

    • @himanshuarora6822
      @himanshuarora6822 Před 3 lety +1

      Hi Aman, Jatin is correct. It should be 72 instead of just 6. If we take 72, the value of residual is (34-51.6)= -17.6.
      Please see and suggest if I am correct. Also, is the value of residual is decreasing in this case from 4 to -17.6. How to further reduce it so that it is closer to 0

    • @r.h.5172
      @r.h.5172 Před 3 lety

      @@himanshuarora6822 I have the same doubt. Is this cleared somewhere? Aman, could you pls. explain.

    • @avneshdarsh9880
      @avneshdarsh9880 Před 3 lety

      output value calculated as the average of residuals in our case (4+8)/2=6

  • @jayitabhattacharyya4313
    @jayitabhattacharyya4313 Před 3 lety +2

    From where is output 6 coming? The similarity score for 2nd branch was 72 according to your formula. I fail to understand please help

    • @harshagarwal8170
      @harshagarwal8170 Před 3 lety +1

      you can see from previous tree (4+8)/2+0 = 6 here lambda is 0 as said by him ....

    • @tempura_edward4330
      @tempura_edward4330 Před 3 lety

      I think he means to calculate the new prediction is just: (4+8)/#R, not calculating the similarity score. I got confused too. 😁

  • @vishnukv6537
    @vishnukv6537 Před 3 lety

    good explanation :)

  • @TarashankarSenapati-yz8rv

    Sir ,how 6 is coming ,u missed to square the sum of 4 and 8, please tell me

  • @lifeisbeautiful1111
    @lifeisbeautiful1111 Před 9 měsíci

    hi can u pls explain how the output is 6 for the second observation in the table?

    • @ujjwalgoel6359
      @ujjwalgoel6359 Před 7 měsíci

      yes i was looking for same question cuz the output for right node was 72

    • @ujjwalgoel6359
      @ujjwalgoel6359 Před 7 měsíci

      do u know the answer?

  • @AMVSAGOs
    @AMVSAGOs Před 3 lety

    Hai Aman, can you please tell us, why the data should be normally distributed. and how does it affects the ML models?

  • @Amit-dl4vd
    @Amit-dl4vd Před 2 lety

    Where is gradient descent happening in the algorithm?

  • @ppsheth91
    @ppsheth91 Před 4 lety

    Hello Sir,
    Really very nice explanation for such a complicated algorithm. Hardly there is any video which describes indepth intuition for Xgboosting.. Thanks a lot Sir..
    One doubt : Can u explain how the classification for any new record will take place from test data set?
    Can you create such videos for Catboost and Light GBM ?

  • @gauravverma365
    @gauravverma365 Před 2 lety

    Can we generate the mathematical equations between adopted inputs and output parameters after successful implementation of xgboost?

  • @rajeev264u
    @rajeev264u Před 4 lety

    Thanks Aman for sharing your knowledge. Great learning. Can you please explain the relation between min_child_weight and Gamma. Do we still need to tune min_child_weight if we are using Gamma values for tuning as the tree is getting pruned by using a higher Gamma?

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety

      Hi Rajeev, about tuning your hyperparameter, you should try with different combinations to see what works good for your model. We can not take a generic approach for all data.

  • @souravbiswas6892
    @souravbiswas6892 Před 4 lety

    Awesome explanation 👍 although it was bit complicated. can you create videos on poisson regression and survival analysis?

  • @rahuljaiswal141
    @rahuljaiswal141 Před 4 lety

    Can you make end to end clustering algorithm. How to select variable, no of clusters and then final deployment

    • @UnfoldDataScience
      @UnfoldDataScience  Před 4 lety +1

      Hello, Thanks for feedback, I will note this topic and create video in coming week for sure.

  • @shivanshjayara6372
    @shivanshjayara6372 Před 3 lety

    I dont understood the how the tree will decide which is to be the root node....if it depend on the I.G. then i got it and second thing is.....it will be better if you take more than 3 records in that example.........like 5-6 coz im not able to get whether every row is getting into operation or whole row at once

  • @vikram5970
    @vikram5970 Před 2 lety

    Hi Sir,i couldnot find the link to 'how gradient boost works' the theoretical explanation. i found the one which exlains about why the XGboost is fast and has high performance.
    can you please give me the link to how XGBoost works.

  • @ajaybhatt6820
    @ajaybhatt6820 Před 4 lety

    sir please make vedios on RNN ,LSTM

  • @Hu.aventuras
    @Hu.aventuras Před 2 lety

    Hi Aman!!!, have a question, how can i predict a gender with mobile data phone with an XGBoost algorithm

    • @UnfoldDataScience
      @UnfoldDataScience  Před 2 lety

      You need to create data such that your target column is gender and is the u can run xgboost classifier.

  • @dvp1678
    @dvp1678 Před 2 lety

    At 7:19 is it not Age < 10 instead of Age > 10?

  • @giridharreddy8113
    @giridharreddy8113 Před 4 lety

    Probably the best in youtube. It would be really great if you could make a video of books where you have learnt from and if possible provide book links to amazon.

    • @UnfoldDataScience
      @UnfoldDataScience  Před 4 lety

      Thanks Giridhar. On books, please find my recommendation below,. you will find links to buy in description of same video:
      czcams.com/video/jDwqjmW1Fcg/video.html

  • @harivgl
    @harivgl Před 3 lety

    Are all models M1, M2 etc. the same model, data and tree and features used?

  • @saurabhdeokar3791
    @saurabhdeokar3791 Před 2 lety

    In new prediction, which value you take as output?

  • @hiteshyerekar2204
    @hiteshyerekar2204 Před 3 lety

    Hi Aman it's only change one residual i.e 2.2 what about remaining how we get remaining residuals ?

  • @shivanshjayara6372
    @shivanshjayara6372 Před 3 lety

    i dont understand formula:
    firstly u used (sum of res square)
    &
    secondly u used only (sum of res)
    what is reason?

  • @rafsunahmad4855
    @rafsunahmad4855 Před 3 lety

    is knowing the math behind algorithm must or just knowing that how algorithms works is enough? please please please give a reply.

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety +1

      Knowing math is must.

    • @rafsunahmad4855
      @rafsunahmad4855 Před 3 lety

      I'm confused because other told me that if I want to do a job which is related to research means improve machine leaning or create new algorithm then I must learn behind the math means how math working behind of an algorithm but for normal data science job it will be enough that how an algorithm work but knowing how math working behind of an algorithm is not must. please give a reply

  • @AhmadHassan-on6zq
    @AhmadHassan-on6zq Před 3 lety

    🙇‍♂️

  • @januaralamien9421
    @januaralamien9421 Před 3 lety

    XGBOOST = Algorithm or Framework?
    please explain

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety

      Internally a framework however the implementation is available in python hence we call algorithm.

  • @71shubham
    @71shubham Před 3 lety +1

    How did we decide age splitting criterion?

  • @chanpreetsingh93
    @chanpreetsingh93 Před 4 lety

    How to set or calculate gamma value?

    • @UnfoldDataScience
      @UnfoldDataScience  Před 4 lety

      Good question, Its subjective based on how model is behaving with Data, We can give a range and decide to tune it.

  • @YashSharma-xb2os
    @YashSharma-xb2os Před 4 lety

    Hi aman, please make a telegram group or whatsapp group where we can connect and join with you and ask queries.

  • @geetisudhaparida2523
    @geetisudhaparida2523 Před 3 lety

    From where 6 value has come??

  • @tapaspal8623
    @tapaspal8623 Před 4 lety

    Hi Aman Sir,
    Can you please explain how parallelism happens since it runs in sequencial manner. Next model requires previous models output.
    Thanks,
    Tapas

    • @UnfoldDataScience
      @UnfoldDataScience  Před 4 lety

      Hi Tapas, parallelism not in terms on model training , I was talking about parallelism in terms of hardware for example using multi cores of the processor, not to be confused with model training.

  • @navneetgupta4669
    @navneetgupta4669 Před 3 lety

    the learning rate was 72 (144/2). how did it change to 6?

    • @UnfoldDataScience
      @UnfoldDataScience  Před 3 lety

      Hi Navneet, can u let me know the time in the video, I will play and check that part.

    • @navneetgupta4669
      @navneetgupta4669 Před 3 lety

      @@UnfoldDataScience after 12:00
      When you added another input (11). You took lambda as zero but forgot to square the numerator.

    • @avneshdarsh9880
      @avneshdarsh9880 Před 3 lety

      output value calculated as the average of residuals in our case (4+8)/2=6

  • @bestcakesdesign
    @bestcakesdesign Před 3 lety

    Where are you working sir?

  • @rohitkaushik2172
    @rohitkaushik2172 Před 2 lety +1

    Please don't copy the examples please present new examples

  • @sandipansarkar9211
    @sandipansarkar9211 Před 3 lety

    great explanation