At 10:27 I don't understand why the similarity score after the split is affected by the change of the lambda value before the split "Why the similarity score after the split will go down?". As I understood from the video, the split rule has nothing to do with the lambda value, therefore if lambda value changed, the split remains the same. the only thing changes is the gain, when the lambda value goes up, then the smilarity score before the split decreases and the gain increases because the deducted value (similarity score before the split) decreases when lambda value gets higher.
Thanks a lot aman. Great video, Teaching is an art and you are doing justice to that every time by breaking down the concept to little steps and explaining it in a way it reaches everyone. keep up your good work.........i am expecting more videos in your NLP playlist
Hi, excellent explanation but I have some points which are not clear to me yet. 1. how you choose the criteria to split the XGBoost tree by? for instance, you chose 'age
@@UnfoldDataScience Simplyifying concept with out loosing the complexity. One of the best explanation in youtube. Your channel really deserves more visibility. All the very best Aman.
Thanks a lot for this excellent video! I am still curious about how xgboost can achieve parallelization and how it handles missing values as you mentioned before. Looking forward to your new videos!
Sir, In formula new prediction=old prediction + Learning rate * output. I didn't understand how to get the output value as 6 for the second record. Could you explain once again.
Nice explanation!!!! Can you please make a video on XG Boost, Gradient boost where the dependent variable is binary/categorical in nature say Good/Bad(0,1)
1) We square the sum of residuals when we compute similarity score and not when we make a prediction. 2) As we are making a prediction, and assuming lambda is 0, prediction is just an average of all the values (residuals) in a particular leaf. 3) output means residuals, we predict the residuals (because it is a boosting algorithm) such that the weighted sum of all the residuals is close to the target variable as much as possible (final prediction by our model).
Sir you take only one feature for prediction,what if data have more than 1 feature,on which crieteria ,the model select the feature,Is information gain like approach is use or any other approach...... Please explain sir
Thanks a lot for the lecture. Can you please clarify as to what happens in case of a classification problem? I mean what about the residuals in case of a classification problem as there will no residuals in them. How do we interpret these learnings for a classification problem?
Dear Aman ji , One question please ... SS value is SR suqare but when you are calculating for the 11 value you do only sum of residual but not doing their square so please explain how it come for 6 . if do square of SR then value can be different
@@UnfoldDataScience Thanks to Aman Ji for reading your viewers' comments and respecting their doubts. I think in one comment you gave the full paper link and one more link for more detail so I will check from there... thanks
Hello sir i dint follow the concept of how it's handling the outlier as you said it handles the outlier but u have not explained how as lamda increases the similarity score decreases but how is it imaoacting or taking care of the outliers i dint follow it as i couldn't understand the relationship bw them Second let's say a new data comes i.e 11 so it goes to the branch greater then 10 and again will a new similarity score be count d bcs u have a 3rd data i.e 11 So (4+8+11)^2/3+0
In the end , the new prediction value is subtracted with iq to find the new residue value when new predictions are done, why is the residue value calculated only for 34 and why not 20 amd 38
Very informative. Thanks for explaining the concept such that it is understood easily. I just to want to understand a effect of outlier on the base value(Model 0). Since mean value(which is high in the presence of outlier) is considered initially to calculate the residuals and for prediction, wouldn't it have greater impact ?. Please share your insights.
Sir what exactly is difference between base model trees created in gradient boosting and xgboost.? Do gradient boosting also use this above formula which you have shown in the video
HI NIshi, Sorry for late reply. That will be little long explanation. Please check below link for understanding more: datascience.stackexchange.com/questions/15305/how-does-xgboost-learn-what-are-the-inputs-for-missing-values
very explanatory video, great work bro, I just need to ask one thing, that output thing at the end, how we got 6 as the output because we divide (4+8)^2 / 2+0 = 72 ,if we do not square this we get 6, but the formula is with square right?, so how we got that 6 as the output? it must be something else(may be72 i think), please explain.
Hi Aman, Jatin is correct. It should be 72 instead of just 6. If we take 72, the value of residual is (34-51.6)= -17.6. Please see and suggest if I am correct. Also, is the value of residual is decreasing in this case from 4 to -17.6. How to further reduce it so that it is closer to 0
Hello Sir, Really very nice explanation for such a complicated algorithm. Hardly there is any video which describes indepth intuition for Xgboosting.. Thanks a lot Sir.. One doubt : Can u explain how the classification for any new record will take place from test data set? Can you create such videos for Catboost and Light GBM ?
Thanks Aman for sharing your knowledge. Great learning. Can you please explain the relation between min_child_weight and Gamma. Do we still need to tune min_child_weight if we are using Gamma values for tuning as the tree is getting pruned by using a higher Gamma?
Hi Rajeev, about tuning your hyperparameter, you should try with different combinations to see what works good for your model. We can not take a generic approach for all data.
I dont understood the how the tree will decide which is to be the root node....if it depend on the I.G. then i got it and second thing is.....it will be better if you take more than 3 records in that example.........like 5-6 coz im not able to get whether every row is getting into operation or whole row at once
Hi Sir,i couldnot find the link to 'how gradient boost works' the theoretical explanation. i found the one which exlains about why the XGboost is fast and has high performance. can you please give me the link to how XGBoost works.
Probably the best in youtube. It would be really great if you could make a video of books where you have learnt from and if possible provide book links to amazon.
Thanks Giridhar. On books, please find my recommendation below,. you will find links to buy in description of same video: czcams.com/video/jDwqjmW1Fcg/video.html
I'm confused because other told me that if I want to do a job which is related to research means improve machine leaning or create new algorithm then I must learn behind the math means how math working behind of an algorithm but for normal data science job it will be enough that how an algorithm work but knowing how math working behind of an algorithm is not must. please give a reply
Hi Aman Sir, Can you please explain how parallelism happens since it runs in sequencial manner. Next model requires previous models output. Thanks, Tapas
Hi Tapas, parallelism not in terms on model training , I was talking about parallelism in terms of hardware for example using multi cores of the processor, not to be confused with model training.
This channel has become one of my favorite platforms to learn ml, owing to the crisp explanation by Aman.
Excellent explanation
Nice one here. Thank you for the simplicity employed in explaining the core concepts.
At 10:27 I don't understand why the similarity score after the split is affected by the change of the lambda value before the split "Why the similarity score after the split will go down?". As I understood from the video, the split rule has nothing to do with the lambda value, therefore if lambda value changed, the split remains the same. the only thing changes is the gain, when the lambda value goes up, then the smilarity score before the split decreases and the gain increases because the deducted value (similarity score before the split) decreases when lambda value gets higher.
Looking at the content & no. Of subscribers. Highely underrated
Kindly share within your groups Nikhil, that may help. tq
Thanks a lot aman. Great video, Teaching is an art and you are doing justice to that every time by breaking down the concept to little steps and explaining it in a way it reaches everyone. keep up your good work.........i am expecting more videos in your NLP playlist
Thanks a ton Abirami. Hope you and your family are staying safe and good.
listen to this video 3 times.. lot of insights. Thank you.
one of the best explanation on complex intuition of XG Boost.....
Thanks Animesh.
Hi,
excellent explanation but I have some points which are not clear to me yet.
1. how you choose the criteria to split the XGBoost tree by? for instance, you chose 'age
Definitely best & most understandable explanation of XGB🔥
Cheers Nikhil.
Very NIce Explanation!
Thanks Krishna.
@@UnfoldDataScience Simplyifying concept with out loosing the complexity. One of the best explanation in youtube. Your channel really deserves more visibility. All the very best Aman.
Gr8 video Sir. You have explained it clearly and in a very simple way. Thanks a lot 🙏
So nice of you Santosh. Please share with friends.
Superb simple explanation, Thank you very much
Thanks a lot for this excellent video! I am still curious about how xgboost can achieve parallelization and how it handles missing values as you mentioned before. Looking forward to your new videos!
Sir, In formula new prediction=old prediction + Learning rate * output. I didn't understand how to get the output value as 6 for the second record. Could you explain once again.
formula = sum of residuals/no. of residuals
Very nicely explained, Thanks Sir. One of the best videos I have seen on CZcams.
Thanks Ruchita.
very nice video.brief..concise..to the point..agree with others..probably the best explanation so far on youtube .way to go bro
Your comments are my motivation Samar. Thanks for motivating.
Thanks a lot for this. Very helpful for me as I am brushing up on ML theory for interviewing. Awesome work!
Glad it was helpful!
Q1. how to interpret Similarity Score.
Q2. what is meaning of High Similarity Score and Low Similarity Score
Excellent explanation.
Thanks a lot for watching Miroslav.
finished watching
Very good Aman
Thank you.
YOU ARE TRUE KNOWLEDGE
Thanks Vivek, cheers.
Nice explanation!!!!
Can you please make a video on XG Boost, Gradient boost where the dependent variable is binary/categorical in nature say Good/Bad(0,1)
Great suggestion Subhadip. Noted.
I like the way u explain complex concepts in simple way. Thanks
Thanks Vibhaas, ur comments motivate me :)
Hey good one again! Continue your good work.. Thanks
Thanks for the feedback.
Good explanation sir.Kindly make a video on SVM and alternate decision tree
Awesome indepth explanantion, keep up the good work man!
Glad you liked it!
Straight to the point. Thanks
Welcome Babu.
Sir, can you please tell why you didnt square the SR in12:08.
And can you tell how the output at 14:02 is 6?
What does the output actually mean?
output value calculated as the average of residuals in our case (4+8)/2=6
1) We square the sum of residuals when we compute similarity score and not when we make a prediction.
2) As we are making a prediction, and assuming lambda is 0, prediction is just an average of all the values (residuals) in a particular leaf.
3) output means residuals, we predict the residuals (because it is a boosting algorithm) such that the weighted sum of all the residuals is close to the target variable as much as possible (final prediction by our model).
@@akashkewar Are you sure we "square the sum" and not "sum the square"? The "square of sum" in the video doesn't make sense!
Hi aman
please clear this
12:01 why not square of sum of residuals as u said in the formula?
Hi Aman,thanks for the video,please explain how lambda controls overfitting
Great explanation
Glad it was helpful Sunil. You're very welcome Goundo. If possible, Please share the link within data science groups. Thanks again.
Sir you take only one feature for prediction,what if data have more than 1 feature,on which crieteria ,the model select the feature,Is information gain like approach is use or any other approach......
Please explain sir
Thanks a lot brother..god bless u for ur information
Always welcome Parv.
thank you
could you explain how to do feature important using XGboost
can u do one for xgboost classification
one question- what is the output in the last formula of new prediction. which output it is?
Thanks a lot for the lecture. Can you please clarify as to what happens in case of a classification problem? I mean what about the residuals in case of a classification problem as there will no residuals in them. How do we interpret these learnings for a classification problem?
Thanks for the excellent work!!
This the best vedio in XGboost.
Thanks Vish.
If we take mean as the criteria then sum of the residual will always be zero if values are taken as it is(with signs)
Dear Aman ji , One question please ... SS value is SR suqare but when you are calculating for the 11 value you do only sum of residual but not doing their square so please explain how it come for 6 . if do square of SR then value can be different
I will check - I may have possibly made mistake, did u check previous comments?
@@UnfoldDataScience Thanks to Aman Ji for reading your viewers' comments and respecting their doubts. I think in one comment you gave the full paper link and one more link for more detail so I will check from there... thanks
can you just describe the loss function for it ?
Is the same procedure for classification?
too good...!!
Thanks Ganesh
Hello sir i dint follow the concept of how it's handling the outlier as you said it handles the outlier but u have not explained how as lamda increases the similarity score decreases but how is it imaoacting or taking care of the outliers i dint follow it as i couldn't understand the relationship bw them
Second let's say a new data comes i.e 11 so it goes to the branch greater then 10 and again will a new similarity score be count d bcs u have a 3rd data i.e 11
So (4+8+11)^2/3+0
In the end , the new prediction value is subtracted with iq to find the new residue value when new predictions are done, why is the residue value calculated only for 34 and why not 20 amd 38
Very clear ! Thank you !✨🙏
You’re welcome 😊. Please share my videos in various data science groups you are part of, that will motivate me to create more content :)
you are a god...
IT'S TOO MUCH :)
Very informative. Thanks for explaining the concept such that it is understood easily. I just to want to understand a effect of outlier on the base value(Model 0). Since mean value(which is high in the presence of outlier) is considered initially to calculate the residuals and for prediction, wouldn't it have greater impact ?. Please share your insights.
Yes, exactly Dhinesh, there will be outlier impact hence better to take care of it before starting training.
Sir what exactly is difference between base model trees created in gradient boosting and xgboost.? Do gradient boosting also use this above formula which you have shown in the video
Can we apply L1 and K2 regularization technique to any algorithms whether its Linear regression, xgboost, gboost, random forest or etc?
Not directly, there are different regularization parameters we can tune in various algo.
Sir, Can you please how this xgboost works for logistic regression.
Do u mean classfication?
@@UnfoldDataScience yeah my bad yes., thanks for taking time and replying sir.
Please make vedio on "Pipeline" of building model and how it is implement in Production
Noted.
Do we calculate IG and Entropy for splitting criteria?
No, python does for us.
sir, please do cover light GBM and is advantage over XGboost
Thanks Mayank. Noted.
Hey Aman, you talked about missing value treatment in XG boost in your previous video..how does XG boost treat missing values?
HI NIshi, Sorry for late reply. That will be little long explanation. Please check below link for understanding more:
datascience.stackexchange.com/questions/15305/how-does-xgboost-learn-what-are-the-inputs-for-missing-values
14:22 , I think output is (12) square, 144/2+0, 72. Please Correct me if wrong...
Need to check
Sir nice video sir.pls make a video of the dependent variable is categorical that is yes or no .
Ok Vishnu.
very explanatory video, great work bro, I just need to ask one thing, that output thing at the end, how we got 6 as the output because we divide (4+8)^2 / 2+0 = 72 ,if we do not square this we get 6, but the formula is with square right?, so how we got that 6 as the output? it must be something else(may be72 i think), please explain.
Thanks jatin, will check that. Thanks for pointing out.
Hi Aman, Jatin is correct. It should be 72 instead of just 6. If we take 72, the value of residual is (34-51.6)= -17.6.
Please see and suggest if I am correct. Also, is the value of residual is decreasing in this case from 4 to -17.6. How to further reduce it so that it is closer to 0
@@himanshuarora6822 I have the same doubt. Is this cleared somewhere? Aman, could you pls. explain.
output value calculated as the average of residuals in our case (4+8)/2=6
From where is output 6 coming? The similarity score for 2nd branch was 72 according to your formula. I fail to understand please help
you can see from previous tree (4+8)/2+0 = 6 here lambda is 0 as said by him ....
I think he means to calculate the new prediction is just: (4+8)/#R, not calculating the similarity score. I got confused too. 😁
good explanation :)
Glad you liked it Vishnu.
Sir ,how 6 is coming ,u missed to square the sum of 4 and 8, please tell me
hi can u pls explain how the output is 6 for the second observation in the table?
yes i was looking for same question cuz the output for right node was 72
do u know the answer?
Hai Aman, can you please tell us, why the data should be normally distributed. and how does it affects the ML models?
Model gets a wider range to learn from. To keep it simple.
Where is gradient descent happening in the algorithm?
Hello Sir,
Really very nice explanation for such a complicated algorithm. Hardly there is any video which describes indepth intuition for Xgboosting.. Thanks a lot Sir..
One doubt : Can u explain how the classification for any new record will take place from test data set?
Can you create such videos for Catboost and Light GBM ?
Really very nice
Thanks Prayag, I will add videos on Catboost and Light GBM as well.
Thank you.
Can we generate the mathematical equations between adopted inputs and output parameters after successful implementation of xgboost?
Thanks Aman for sharing your knowledge. Great learning. Can you please explain the relation between min_child_weight and Gamma. Do we still need to tune min_child_weight if we are using Gamma values for tuning as the tree is getting pruned by using a higher Gamma?
Hi Rajeev, about tuning your hyperparameter, you should try with different combinations to see what works good for your model. We can not take a generic approach for all data.
Awesome explanation 👍 although it was bit complicated. can you create videos on poisson regression and survival analysis?
Thanks Sourav. Yes I will put that in my list.
Can you make end to end clustering algorithm. How to select variable, no of clusters and then final deployment
Hello, Thanks for feedback, I will note this topic and create video in coming week for sure.
I dont understood the how the tree will decide which is to be the root node....if it depend on the I.G. then i got it and second thing is.....it will be better if you take more than 3 records in that example.........like 5-6 coz im not able to get whether every row is getting into operation or whole row at once
Hi Sir,i couldnot find the link to 'how gradient boost works' the theoretical explanation. i found the one which exlains about why the XGboost is fast and has high performance.
can you please give me the link to how XGBoost works.
sir please make vedios on RNN ,LSTM
Hi Ajay, It will come for sure
Hi Aman!!!, have a question, how can i predict a gender with mobile data phone with an XGBoost algorithm
You need to create data such that your target column is gender and is the u can run xgboost classifier.
At 7:19 is it not Age < 10 instead of Age > 10?
Probably the best in youtube. It would be really great if you could make a video of books where you have learnt from and if possible provide book links to amazon.
Thanks Giridhar. On books, please find my recommendation below,. you will find links to buy in description of same video:
czcams.com/video/jDwqjmW1Fcg/video.html
Are all models M1, M2 etc. the same model, data and tree and features used?
Depends on what your M1, M2 are, usually same.
In new prediction, which value you take as output?
Which part of the video Saurabh?
@@UnfoldDataScience Maths part...
Hi Aman it's only change one residual i.e 2.2 what about remaining how we get remaining residuals ?
In the similar way, I just gave one example Hitesh.
i dont understand formula:
firstly u used (sum of res square)
&
secondly u used only (sum of res)
what is reason?
is knowing the math behind algorithm must or just knowing that how algorithms works is enough? please please please give a reply.
Knowing math is must.
I'm confused because other told me that if I want to do a job which is related to research means improve machine leaning or create new algorithm then I must learn behind the math means how math working behind of an algorithm but for normal data science job it will be enough that how an algorithm work but knowing how math working behind of an algorithm is not must. please give a reply
🙇♂️
:)
XGBOOST = Algorithm or Framework?
please explain
Internally a framework however the implementation is available in python hence we call algorithm.
How did we decide age splitting criterion?
Just for example I took here.
How to set or calculate gamma value?
Good question, Its subjective based on how model is behaving with Data, We can give a range and decide to tune it.
Hi aman, please make a telegram group or whatsapp group where we can connect and join with you and ask queries.
I ll create Yash for sure.
From where 6 value has come??
Where 6? can u point me to timing in video?
Hi Aman Sir,
Can you please explain how parallelism happens since it runs in sequencial manner. Next model requires previous models output.
Thanks,
Tapas
Hi Tapas, parallelism not in terms on model training , I was talking about parallelism in terms of hardware for example using multi cores of the processor, not to be confused with model training.
the learning rate was 72 (144/2). how did it change to 6?
Hi Navneet, can u let me know the time in the video, I will play and check that part.
@@UnfoldDataScience after 12:00
When you added another input (11). You took lambda as zero but forgot to square the numerator.
output value calculated as the average of residuals in our case (4+8)/2=6
Where are you working sir?
Hi Nikul, please ask queries related to Data Science only.
Please don't copy the examples please present new examples
Feedback Taken. HNY
great explanation
Thanks a lot Sandipan.