eXtreme Gradient Boosting XGBoost Algorithm with R - Example in Easy Steps with One-Hot Encoding

Dr. Bharatendra Rai

zhlédnutí 58 396

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 19. 05. 2024
Provides easy to apply example of eXtreme Gradient Boosting XGBoost Algorithm with R .
Data file and R code: github.com/bkrai/Top-10-Machi...
Machine Learning videos: goo.gl/WHHqWP
Timestamps:
00:00 eXtreme Gradient Boosting XGBoost with R
00:04 Why eXtreme Gradient Boosting
00:34 Packages and Data
02:02 Partition Data
03:25 Create Matrix & One Hot Encoding
07:35 Parameters
09:59 eXtreme Gradient Boosting Model
11:51 Error Plot
16:50 Feature Importance
18:00 Prediction and Confusion Matrix - Test Data
24:03 More XGBoost Parameters
Includes,
- Packages needed and data
- Partition data
- Creating matrix and One-Hot Encoding for Factor variables
- Parameters
- eXtreme Gradient Boosting Model
- Training & test error plot
- Feature importance plot
- Prediction & confusion matrix for test data
- Booster parameters
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Komentáře • 272

@Wissro Před rokem ⁺³
Best video on the internet on XGBoost, you just saved my paper. Thanks a lot :)
@bkrai Před rokem ⁺¹
You're welcome!
@nyasha767 Před rokem ⁺¹
I agree 100% with you.
@Viewfrommassada Před 5 lety ⁺¹
Thanksssss a lot Prof! You sent me the link to this video and it REALLY helps. But just as someone suggested in the comment, the parameters in the model are very KEY and a much detailed explanation of them and the algorithm as a whole will REALLY REALLY be APPRECIATED too. I am blessed to be a subscriber of your videos!
@bkrai Před 5 lety
Thanks for your comments and suggestion!
@abhinavmishra7786 Před 6 lety ⁺⁶
I got a much higher level of clarity in the concept of xgboost model and parameter usage with this video. Thanks a lot Sir
@bkrai Před 6 lety
Thanks for comments!
@amaanraza2704 Před 5 lety ⁺²
Hi Bharatendra, I derive a lot of value from your tutorial that strike the right balance between being simple yet very useful. Love them!
@bkrai Před 5 lety
Thanks for your feedback and comments!
@kartikrayaprolu9076 Před 4 lety ⁺¹
Such an elaborate explanation.
Please keep posting such videos. They will be very useful for the community.
I've benefitted a lot from this video.
@bkrai Před 4 lety
Thank you, I will
@gavinwebster8737 Před 4 lety ⁺¹
Best clarity so far on XGBoost, it helped a lot in my final project and in learning more about this algorithm compared to GBM.
@bkrai Před 4 lety
Thanks for comments!
@flamboyantperson5936 Před 6 lety ⁺²
Respect you sir. The kind of knowledge you are sharing from Massachusetts is very very helpful. Thank you so very much Sir.
@bkrai Před 6 lety
Thanks!
@anderswigren8277 Před 6 lety ⁺²
You are an skillful tutor. Keep going on and Happy New Year!
@bkrai Před 6 lety
Happy New Year 2018!
@vijaypalmanit Před 6 lety ⁺²
Thank you so much, this is the video i have been looking for long, didn't find anything interested, you have explain everything in detail and its interesting too.
@bkrai Před 6 lety
Thanks for comments!
@anigov Před 6 lety ⁺³
Thank you Sir for making it so easy
@prahladbhat9516 Před 3 lety ⁺¹
This helped so much on a classification project I am doing. Much thanks!
@bkrai Před 3 lety ⁺²
You're very welcome!
@tonyjames5763 Před 3 lety
@@bkrai ee
@irbobable Před 6 lety ⁺²
Fantastic tutorial, thank you!
@faisalmohammed672 Před 4 lety ⁺⁴
Thank you for the tutorial.
Given that you have a binary target, I was wondering why you haven't used objective='binary:logistic' and eval_metric = 'logloss'.
Is there a downside to using "multi:softprob" for a binary classification problem when it is typically used for multiclass classification where n>2. Appreciate if you could help clarify this.
@tamaraabzhandadze2712 Před 2 lety ⁺²
That was a very good tutorial! I wonder if and how we could use the cross validation for choosing the eta, gamma, iteration etc parameters. I would be happy to have any suggestions.
@mayoordhokia Před 3 lety ⁺¹
After weeks of searching for videos on using XGB and predicting continuous variable, I could not find any decent videos... nor were any of them as well explained (and entertaining) as your videos. Please make one for the community? Best wishes from London, UK
@bkrai Před 3 lety ⁺¹
Thanks for the suggestion and comments, I'm adding this to my list of future video.
@happylearning-gp Před 2 lety ⁺¹
Thank you for this tutorial. Awesome. Step by step explanations made things much easier to understand
@bkrai Před 2 lety
You're very welcome!
@bkrai Před 2 lety ⁺¹
You may also find this useful:
czcams.com/video/GmkHvDs0GG8/video.html
@happylearning-gp Před 2 lety ⁺¹
@@bkrai Thank you very much
@happylearning-gp Před 2 lety ⁺¹
@@bkrai Thank you very much
when you find time kindly have a look at my channel on R. Everything is like a standalone application
czcams.com/channels/DmEAmoLuyE0h61aGpthGvA.html
@bkrai Před 2 lety
You are welcome!
@shadrackbadia1158 Před rokem ⁺¹
Very easy to follow, no errors in code, just great.🤓🙂
@bkrai Před rokem
Great to hear!
@upskillwithchetan Před 4 lety ⁺¹
Thank you sir! awesome explanation skills with depth of algo
@bkrai Před 4 lety
Thanks for your comments and finding it useful!
@sebastianvarela2190 Před 5 lety ⁺¹
Hi Sir, your videos are great. Let me ask you question: I have read that it is possible to implement survival analysis (cox regression) with the XGboost package, indicating "survival:cox" as the learning task parameter. I haven't found any tutorial on this issue. Do you know if it is necesary to make an extra work? for example to specify the time variable in someplace else? Thanks in advance.
@harishnagpal21 Před 3 lety ⁺¹
Thanks for the model. A big help for me.
@bkrai Před 3 lety
Thanks for comments!
@angappanmaruthachalam3054 Před 3 lety ⁺¹
your explanation is awesome !
@bkrai Před 3 lety ⁺¹
Thanks!
@TheLoggic Před 6 lety ⁺²
Cheers Amazing Video Mate!
@swathinandakumar415 Před 3 lety ⁺¹
Thank you ,Sir, for explaining the model so well. I am doing something similar with my data. How can I show the probabilities of predictors. (similar to the one in decision tree)
@dicksang2 Před 5 lety ⁺²
Very very informative. Thanks!
@bkrai Před 5 lety
Thanks for comments!
@bhavanabhardwaj5253 Před 6 lety ⁺⁶
Hello Sir, can you please share an example where the response variable is continuous?
@user-if2ww7sv8x Před 6 lety ⁺²
thank you for your sharing.
@karimkardous5555 Před 6 lety ⁺¹
we can also increase the range on the y axis by using the following lines
plot(e$iter, e$train_mlogloss, col = "blue", type = "l", ylim = c(0, 1))
lines(e$iter, e$test_mlogloss, col = "green")
legend("topright", legend = c("Training Error", "Testing Error"), lty=c(1,1), col = c("blue", "green"))
but i guess for the purposes of this video not using the ylim parameter can be intentional and warranted.
Thank you for the great video as always
@bkrai Před 6 lety
thanks!
@manaspradhan2166 Před 6 lety ⁺²
Thank you sir, This is very helpful
@bkrai Před 6 lety
thanks!
@musasall5740 Před 6 lety ⁺³
Excellent!
@93divi Před 6 lety ⁺¹
thank you Sir...
@hans4223 Před 5 lety ⁺¹
Simply Awesome and excellent ..
@bkrai Před 5 lety ⁺¹
Thanks for comments!
@SmartMrSteve Před 3 lety ⁺²
Thanks for the amazing tutorial for the XGboost!. I can't believe that you make every application of machine learning so easy. I really want your help figure out applying XGboost on time-to-event data. There are so limited resources in terms of XGboost using cox model. Do you have any suggestions? thanks
@bkrai Před 3 lety
I don't have at his time, but have added it to my list.
@hilaav7449 Před 5 lety ⁺¹
Thank you it was very helpful!!
@bkrai Před 5 lety
Thanks for comments!
@sheeqariff7974 Před 5 lety
Hi sir. Your video is very good and easy to understand. I have one question. What is the classifier algorithm used in the xgboost package for classification case? I had read some info in other website that the package includes "tree learning algorithms". Is it decision tree algorithm? thank you in advance for your clarification.
@hmachira1 Před 6 lety ⁺²
Thank you so much
@sebastianvarela2190 Před 5 lety ⁺¹
Excelent video, thanks!
@bkrai Před 5 lety
Thanks for comments!
@tathagataghosh5390 Před 10 měsíci
Sir, Can you please make a video on stacking model for different DL models. Thanks a lot for informative videos sir.
@sebastianvarela2190 Před 5 lety ⁺¹
Hi Sir, Let me ask you a question. In a binary classification context, How do you predict when it is not possible to know the values of the target or outcome variable in a forecasting scenario? I mean you need to forecast a result and have a new dataset without the response variable, that is, you dont know if a student will be admitted or not, but need to make a prediction/using xgboost.
I tried to do this by setting in the "test set" (the new dataset without the response variable) an outcome variable with a fixed value -0 for instance- to be able to run the xgboost, however the prediction is pretty unaccurate.
Thanks very much!
@gurgenhovakimyan329 Před 4 lety ⁺¹
Thank you very much. You helped me a lot.
@bkrai Před 4 lety
Thanks for comments!
@evansumido6191 Před rokem
hi sir. what line of code will i add if i want to see the confusion matrix that will also display 95% CI and Test P-value? great lecture. thank you.
@anshagarwal7020 Před 4 lety ⁺¹
Thank you for the tutorial..Really helped in understanding. I have a question why can't we do dummy encoding for categorical variables in xg boost??
@bkrai Před 4 lety
You may try. It should work fine.
@chaitanyakmr Před 5 lety ⁺¹
thanks a lot for the explanation.
@bkrai Před 5 lety
Thanks for comments!
@MSS864 Před 3 lety ⁺²
I am enjoying watching your videos starting from the simplest to more complicated ones! Thank you Dr. Rai for your great explanation. I have one question, though: When you divide the data into train and test data, you are using data[ind==1, ] and data[ind==2, ]; it is not clear to me how this magically works; however, what I see is data[x, y], where the only values that y can take are blank, and integers from 1 to 400, and the only values y can take are blank, and integer values from 1 to 4. Can you explain to me what is going on? Or, is there any thing that I am missing?
@bkrai Před 3 lety
You can refer to this for explanation:
czcams.com/video/RBojq0DAAS8/video.html
@jairjuliocc Před 5 lety ⁺¹
Very useful , thank you!
@bkrai Před 5 lety
Thanks for comments!
@jojo23srb Před 5 lety ⁺¹
Thanks for the video!
A quick question though: What's the motivation behind the 'prob' vector in 'ind
@bkrai Před 5 lety ⁺¹
prob is the probability. For more details about data partitioning, you can look at this link:
czcams.com/video/aS1O8EiGLdg/video.html
Also date variables are handled differently. Probably I'll do a video about it later.
@ft753 Před 4 lety ⁺¹
Thanks very much for this tutorial - definitely made things easier to understand.
I have a question regarding "objective" = "multi:softprob" in the parameter section. The admission problem in the example deals with a logistic problem, right? So why should we use multi:softprob instead of binary:logistic? If I try the model with this binary:logistic input my models fails.
Would be great if you could help me out on when to use what objective! Thanks.
@bkrai Před 4 lety
Multi works for 2 or more levels.
@gowrikaruppusami7757 Před 4 lety ⁺¹
very excellent explanation lot of thanks
i have one doubt it is possible to use image data specially satellite data
@bkrai Před 4 lety
For image data deep learning is more effective. You can explore ‘deep learning’ playlist on this channel.
@datascience8272 Před 6 lety
Hello Sir,
In a real scenario, where we have a separate test data with no dependent variable. How will the sparse.matrix.model work?
@tadessemelakuabegaz9615 Před 2 lety
Dear Rai, I hope you doing well. I have 1 question. I am doing a machine learning model using the RandomForest and XGBoost algorithms. My data is a survey of samples derived from a large population. My data has a sampling weight which is the number of individuals in the population each respondent in the sample is representing. How can I apply this sampling weight in my ML model? The data also contains strata and clusters. Do I have to keep the sampling weight, strata, and cluster variables with my features?
@jamesstevenson5002 Před 6 lety ⁺²
Thanks a lot
@OrcaChess Před 5 lety ⁺¹
Thank you so much for your instructive and insightful tutorial!
I've one question:
Do I only need one hot encoding for my inputs / features?
What about the outputs, is xgboost able to forecast a categorical variable as a label?
Or should I make one hot encoding for my labels as well?
Kind regards
Jonathan
@bkrai Před 5 lety ⁺¹
For XGBoost, response variable also needs to be numeric. In the example that I used, admit is a factor variable but since it has two values 0 and 1 in numeric form, we didn't do anything. For further explanation about variables, you can also refer to this link:
cran.r-project.org/web/packages/xgboost/vignettes/discoverYourData.html
@OrcaChess Před 5 lety
Thank you very much for your explanations and the link!
What is in your opinion in multi class cases more suitable - Suppose we have one categorical variable with 10 classes (0 to 9) every number is a class :
What do you think is better?
1. Make one model to forecast this categorical variable -> getting 10 different probabilities which sum up to 1.
2. Make 10 different models which forecast for each of the 10 classes yes or no (0 an 1).
In the end we take the model with the highest probability for the yes-case as the forecast
Thanks in advance
Jonathan
@mecobio2 Před 5 lety ⁺¹
The code has room for improvement. For instance, in the splitting of the data, instead of using sample(), you can use createDataPartition() instead, in order to preserve the proportion of the categories in Y variable. The improments goes from 0.7066667 to 0.7375.
Another improvement is to used, say, 10 fold cross validation instead, and used caret R-package with train()
@bkrai Před 5 lety
Thanks for sharing!
@happylearning-gp Před 2 lety
I have a basic question. in logistic regression using lm function, we get model with predictors considered in that. but here, I don't know which are the predictors considered in the bst_model. could you please guide me to extract those predictors from the bst_model. Thank you very much
@ConsuelaPlaysRS Před 5 lety ⁺²
Thank you! I wish you would use caret more, though.
@bkrai Před 5 lety ⁺¹
Thanks for the suggestion!
@tadessemelakuabegaz9615 Před 2 lety ⁺²
I have seen your lecture on logistic regression and randomForest as well. They are awesome. Do we require cross-validation in these ML methods? I haven't observed any cross-validation step in your lecture on LR, RF, and xgboost.
@bkrai Před 2 lety ⁺¹
I've split data in to train and test. But no harm in doing CV.
@seshasaiguna9937 Před 5 lety
can we use xgboost and adaboost for multiclass models?
When using 'adaboost' I'm getting the following error
"Error: Dependent variables must have two levels"
My dataset has 3 levels. You inputs will be helpful and appreciated!
@gmm552 Před 3 lety
How do I interpret cover etc? Also how can we do grid search here for optimisation?
@SaranathenArunE Před 5 lety ⁺¹
thanks sir and brilliant
@bkrai Před 5 lety
Thanks for feedback!
@OrcaChess Před 5 lety ⁺¹
Hello,
is it possible to change the cutoff of the XGB-Model prediction?
In my model evaluation phase I got the case where the AUC in my ROC curve of a model to another model was higher
despite of a clearly worse confusion matrix and accuracy. My guess is that this could be a cutoff issue.
Kind regards
Jonathan
@bkrai Před 5 lety
ROC curve already makes use of various cutoffs to draw the curve. With one cutoff value we will just get one point and not a curve. Looking at two curves can give you better idea about the reasons behind AUC difference.
@akd9977 Před 5 lety ⁺¹
Thank you for explaining clearly. If I have five character indpendent variable in the dataframe and I don't want to drop it, How can I proceed with this concept. It means how the character would be converted to numeric data
@bkrai Před 5 lety
You can do one-hot encoding as shown in the video.
@deannanuboshi1387 Před rokem ⁺¹
Great video! Do you know how to get confidence or prediction interval for xgboost in r? Thanks
@bkrai Před rokem ⁺¹
You can get more details here:
czcams.com/video/hCLKMiZBTrU/video.html
@popezee2029 Před 5 lety ⁺¹
Thanks for the instructive video Sir. I am using a test set that does not contain the dependent variable row because i am supposed to predict that column in a regression problem. How should i edit the script for test_label and watchlist? Thank you.
@bkrai Před 5 lety ⁺¹
You can try this:
new_matrix
@liwenling1287 Před rokem
Thanks Rai for your help tutorial! It really helps me to understand and do XGBoost in R. Here I have a question, if I want to do with the regression problem, can I use the same code? or any parameter should I modify? Hope to hear from you soon.
@bkrai Před rokem
You can see an example here:
czcams.com/video/hCLKMiZBTrU/video.html
You can also get some practice by doing this competition:
czcams.com/video/Dn028hqWnUA/video.html
@liwenling1287 Před rokem ⁺¹
@@bkrai , really helpful! Thanks again for your detail tutorial. Wish you all the best!
@bkrai Před rokem
You are welcome!
@abhibhavsharma8706 Před 4 lety ⁺¹
Thankyou Sir,
Please also give a guidance about how to install the package LightGBM in R and its uses
@bkrai Před 4 lety ⁺¹
Thanks, I've added it to my list.
@WhySoSkyHigh Před 5 lety ⁺¹
absolute legend!
@bkrai Před 5 lety
Thanks for comments!
@harishnagpal21 Před 5 lety ⁺¹
Thanks for the video. In what scenario we should use eXtreme Gradient Boosting!
@bkrai Před 5 lety
You can use it for better accuracy and faster run compared to many other methods.
@harishnagpal21 Před 5 lety ⁺¹
thanks a lot :)
@jjohn108 Před 3 lety ⁺¹
Great tutorials :)
@bkrai Před 3 lety
Thanks for comments!
@shikevin3362 Před 4 lety ⁺¹
you are a legend!!!
@bkrai Před 4 lety
Thanks for comments!
@adarsha1981 Před 6 lety ⁺¹
Hi Bharatendra, nice and veryuseful video.. i have a question.. in my case i have around 4.5 lacks observations and 250 features.. am trying to run XGBoost, its taking some time, thats ok. but not able to remove the XG boost... Note: my data is highly class imbalanced where 0's 75% and 1's 25%.. do you suggest to use XGBoost here? thanks !
@bkrai Před 6 lety
I would suggest take care of class imbalance problem (CIP) before running XGBoost. It will improve accuracy significantly. Here is the link for CIP:
czcams.com/video/Ho2Klvzjegg/video.html
@saipri Před 2 lety
Is there a video for checking the model using chi-square?
@supriyashinde5128 Před 4 lety ⁺¹
Thank you so much for the tutorial.
I have a question
How to plot ROC and AUC curve on the same data set. Can you provide the code for ROC and AUC curve.
@bkrai Před 4 lety
Here is the link:
czcams.com/video/ypO1DPEKYFo/video.html
@OrcaChess Před 5 lety ⁺¹
Hello Bharatendra Rai,
did you make a video about setting up a feature selection in R?
It would be very useful for the case if you have lots of features / inputs and you want to find out
which of these features are relevant to determine a feature subset for the classifier.
Kind regards
Jonathan
@bkrai Před 5 lety ⁺²
I'll be doing one in August.
@OrcaChess Před 5 lety ⁺¹
Bharatendra Rai Looking forward to it! 👍 Thank You for your Deep and to the point Data Science tutorials - I recommend it in Karlsruhe every student who wants to run ML models in R.
@OrcaChess Před 5 lety ⁺¹
Bharatendra Rai Looking forward to it! 👍 Thank You for your Deep and to the point Data Science tutorials - I recommend it in Karlsruhe every student who wants to run ML models in R.
@OrcaChess Před 5 lety
Bharatendra Rai Looking forward to it! 👍 Thank You for your Deep and to the point Data Science tutorials - I recommend it in Karlsruhe every student who wants to run ML models in R.
@bkrai Před 5 lety
Thanks for your comments and recommendations!
@nguyenphananhhuy416 Před 6 lety
Is there any example of using XGboost to make prediction ? It seems that this video is for the classification case.
@eliecerecology Před 5 lety ⁺¹
Thanks for the video. I have a question why did not you use objective" = "binary:logistic"?
@bkrai Před 5 lety
Yes, that should be more appropriate.
@supriyashinde5128 Před 4 lety
Hello Sir, Can we add hyper parameter tuning in XGBOOST. If yes, then how
@shinuignatious308 Před 5 lety ⁺¹
Thank you so much sir for your in-depth tutorials. Sir could u please post github link for the code as well.?
@bkrai Před 5 lety
Link to the code is in the description area below the video.
@bkrai Před 4 lety
Link ti GitHub: github.com/bkrai/Top-10-Machine-Learning-Methods-With-R
@zacs7971 Před 4 lety ⁺²
Hello Professor, thank you for this video.
I'm receiving this error after attempting to assign the same line of code you have in line 22. Any ideas on how to resolve?
Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) : The length of labels must equal to the number of rows in the input data
@bkrai Před 4 lety
Following provides some clue "length of labels must equal to the number of rows in the input data".
@vishwajitsen1434 Před 5 lety ⁺¹
Can you please upload videos LSTM in Keras in R for numerical categorical and multiclass outcomes....it would be really great
@bkrai Před 5 lety
Thanks for the suggestion! It's on my list for future videos.
@Viewfrommassada Před 5 lety ⁺¹
Also Prof Rai, I am building an Ensemble model of Random Forest and Xgboost with R. My response variable has 2 levels 'Low' and 'High'. The response variable's scale is a factor in R. Without converting these '0's and '1's, can I build the model? Also, some of my predictor variables have levels A, B, C, D and E and their scales as detected my R are factors. Do I have to convert these to Zeros and Ones numbers even though they are factors before I use them?
@bkrai Před 5 lety
When you use random forest, you do not need to convert categorical independent or dependent variable to numeric. But you definitely need numeric variable when using xgboost.
@Viewfrommassada Před 5 lety
Your explanation helped a lot. Thanks. I am building an ensemble of Random Forest and Xgboost on a classification problem. I have imbalanced data so used your video to balance ONLY my training data. (I hope that's all that I need to do in terms of the balancing?). After balancing, I applied your One-Hot encoding tutorial on both my balanced Train data and my unbalanced Test data. My Xgboost is running well though I am yet to test it. BUT the problem is the Random Forest. When i pass the data through the RF I get the error message below::::
Error in t.default(x) : argument is not a matrix
In addition: Warning messages:
1: In randomForest.default(x, y, mtry = mtryStart, ntree = ntreeTry, :
The response has five or fewer unique values. Are you sure you want to do regression?
2: In is.na(x) :
is.na() applied to non-(list or vector) of type 'externalptr'
What could be the solution to it? Your help is greatly be appreciated, Prof Rai!
@Didanihaaaa Před 6 lety ⁺¹
First I should appreciate for providing such helpful educational channel. Thanks a lot Sir. Kindly I have a question regards factor parameter.
Should I turn all integer values to Factor? cuz I got an error that " xgb.DMatrix(data = as.matrix(train), label = train_label) :
REAL() can only be applied to a 'numeric', not a 'integer'"?
Could you please explain how did you choose the rank column to turn into the Factor and matrix variable?
Best Regards,
@bkrai Před 6 lety ⁺¹
I used rank as an example for dealing with factor variables. In your dataset if you have any factor variable, you can handle it in a similar manner.
@ramp2011 Před 5 lety ⁺¹
Would you consider using caret and calling xboost there directly? Is there a benefit from using this direct method versus using caret? Thank you
@bkrai Před 4 lety
That should also work fine. As long as we use the same method, model performance is not likely to be significantly different.
@tadessemelakuabegaz9615 Před 2 lety ⁺¹
Hi Rai. Hope everything is going good. I am currently working on an ML algorithm with a continuous outcome variable. I am new to a regression model. I want to develop randomForest and XGBoost regression. Can I ask for any reference video and codes related to a regression algorithm using RnadomForest and XGBoost
@bkrai Před 2 lety
Refer to:
czcams.com/video/hCLKMiZBTrU/video.html
@adarsha1981 Před 6 lety ⁺¹
Hi Bharatendra.. i tried searching Bagging/Boosting and SMOTE videos from your playlist.. aren't they out yet? if not yet , waiting to see them :)..
@bkrai Před 6 lety
Not yet.
@tadessemelakuabegaz9615 Před 2 lety ⁺¹
Hi Rai. Great job. I have one question. How can we construc ROC&ACU for the XGBOOST model
@bkrai Před 2 lety ⁺¹
See if this help. It has more detailed coverage:
czcams.com/video/ftjNuPkPQB4/video.html
@tadessemelakuabegaz9615 Před 2 lety ⁺¹
@@bkrai Thank you so much
@bkrai Před 2 lety
You are welcome!
@nithinmamidala Před 5 lety
Please give an explanation about the algorithm so that its helpful to understand much better
@Viewfrommassada Před 5 lety ⁺¹
Hi Prof., I have come again with a question since I am learning a lot with your videos. Could you please explain very well the 'eta' parameter in xgboost and also I want to report the AUC metric in my xgboost model and I need your guidance. I have seen examples on google but I get error when i try. I am making a presentation on xgboost soon. Your help will be appreciated.
@bkrai Před 5 lety
eta is the learning rate. When is is high, computation is faster, but you may miss the optimum. When it is low, computation is slower, but there is a better chance of hitting the optimum. Depending on the data size and problem, we try various values to explore what is best for a given problem. For AUC you can try this:
czcams.com/video/ypO1DPEKYFo/video.html
@kartikrayaprolu9076 Před 4 lety ⁺¹
Hi Sir,
Why have you used "-1" in the sparse.model.matrix" function?
Does it specify that the "first column" is not to be included or does it not include only one column i.e. the "response" variable?
@dhavalpatel1843 Před 4 lety ⁺²
No. of classes are 2 so If we put -1 those classes will become 0 and 1 because in this case 0 is for not admitted and 1 is admitted
@bkrai Před 4 lety
Thanks for the update!
@bkrai Před 4 lety ⁺¹
here is an update: “-1” removes an extra column which this command creates as the first column.
@phuongk.kttp-mtnguyenkieul2761 Před 4 lety ⁺¹
Thank you for your valuable video. I have a question in bst_model step, it is not work. My data has number of class is 122. When I run, R result displays error: label must be in [0, num_class). I try so many nrounds value in range 0 and 122, but haven't worked. Hope to get your response. Many thanks!
@bkrai Před 4 lety ⁺¹
I think 122 is too many classes. make sure you have enough data for each class otherwise there could be issues.
@phuongk.kttp-mtnguyenkieul2761 Před 4 lety ⁺¹
@@bkrai Do you have any solution to handle, Dr.?
@bkrai Před 4 lety
Difficult to say much without looking at data
@nabaafrin9137 Před 5 lety ⁺¹
can you please tell me which editor you used ?
@bkrai Před 5 lety
I use final cut pro.
@harishnagpal21 Před 3 lety ⁺¹
I have on query. Here in this example we are aware about response variable in test set as we have divided actual data into 80/20. But in actual life like in Kaggle competitions we need to predict on Test set given by Kaggle where we need to predict on Response variable. So how that will fit into above code. ie how to do prediction on actual Test set in xgboost. Thanks in advance.
@bkrai Před 3 lety
This code will not change much. But you will definitely have to make some adjustments before you can correctly submit your file on Kaggle. You can refer to this example:
czcams.com/video/4ld-ZfrCc0o/video.html
@jojo23srb Před 5 lety ⁺¹
Q: what's stopping someone from just changing all their variables to numeric types and skipping over the one-hot encoding process altogether? Does it hurt the prediction?
@bkrai Před 5 lety
I would suggest try both and compare results.
@rachelfan4664 Před 5 lety ⁺¹
Hi Rai, my test data doesn't have response variables, I need to predict them. What should I do with all the test_matrix stuff?
@bkrai Před 5 lety
You can artificially create it and fill with zeros.
@rachelfan4664 Před 5 lety
Bharatendra Rai thanks sir, will try
@foram224 Před 4 lety ⁺¹
I have one question, if you have created sparse matrix for train and test set then why are you using as.matrix for trainm in xgb.DMatrix? sparse matrix is also you can directly use. I am confused in xgb.DMatrix and before the step which is sparce.model.matrix.
Another question I have, what if your responce variable is in position of 43 not 1 then still you have to use -1 in sparse matrix.?
Thanks you so much for video its really nice but I have just questions depends on my dataset. Hopping for your reply. thanks.
@bkrai Před 4 lety
For the 1st question, I would suggest try and see if it works. If it works then you are fine.
I didn't fully understand 2nd question. Are you referring to code line 43?
@foram224 Před 4 lety
@@bkrai I appreciate your reply. For my data set if I use as.matrix on sparce.model.matrix than it was giving me an error. So, I am better using only sparce.model.matrix varibale directly in xgb.DMatrix. That is all clear now. you are getting mlogloss but I was getting merror. I used same parameters as yours.
@upskillwithchetan Před 4 lety ⁺¹
Hi Sir, I have confusion @4:18 you have mentioned that put -1 because "Admit" is first column in dataset but according to this blog www.analyticsvidhya.com/blog/2016/01/xgboost-algorithm-easy-steps/ - “-1” removes an extra column which this command creates as the first column.
please confirm
@bkrai Před 4 lety
You are right. Once 'admit' is there before ~ symbol, it is automatically out.
@gtmpai Před 4 lety
I am not able to get $evaluation_log in bst_model. Is there anything i am missing
@utkarshprajapati9876 Před 5 lety ⁺¹
Hi Sir, nice and very useful video sir I want to ask when I use XGBoost algorithm then I do not need to use linear and logistic regression?
@utkarshprajapati9876 Před 5 lety ⁺¹
I want to use XGBoost algorithm in this problem. www.kaggle.com/c/house-prices-advanced-regression-techniques
@bkrai Před 5 lety ⁺¹
It's better to try more methods and then see wgich one performs better.
@utkarshprajapati9876 Před 5 lety ⁺¹
@@bkrai okay sir thanks.
@utkarshprajapati9876 Před 5 lety ⁺¹
@@bkrai Sir u r really great man.
@haanda47 Před 6 lety ⁺¹
Sir, can you please upload an video for Adaptive boosting in R. Thanks in Advance.
@bkrai Před 6 lety
Thanks for the suggestion, I've added it to my list.
@fantomraja9137 Před 5 lety ⁺¹
thnx so much.....
@bkrai Před 5 lety
Thanks for comments!
@missakboyajian6446 Před 6 lety ⁺²
Hi Thanks for the video. I have a problem I think. When I do feature importance I am getting the target column also with it. My target column is 'dismissed' and I put it the first column. This is how i am loading it.
train
@bkrai Před 6 lety
I think lines 3 to 6 is not needed.
@navdeepagrawal7819 Před rokem ⁺¹
Sir, how we can optimize hyperparameters in the case of xgboost algo?
@bkrai Před rokem ⁺¹
Refrr to this:
czcams.com/video/GmkHvDs0GG8/video.html

Další v pořadí

Automatické přehrávání

Gradient Boosting : Data Science's Silver Bullet