eXtreme Gradient Boosting XGBoost Algorithm with R - Example in Easy Steps with One-Hot Encoding

Sdílet
Vložit
  • čas přidán 19. 05. 2024
  • Provides easy to apply example of eXtreme Gradient Boosting XGBoost Algorithm with R .
    Data file and R code: github.com/bkrai/Top-10-Machi...
    Machine Learning videos: goo.gl/WHHqWP
    Timestamps:
    00:00 eXtreme Gradient Boosting XGBoost with R
    00:04 Why eXtreme Gradient Boosting
    00:34 Packages and Data
    02:02 Partition Data
    03:25 Create Matrix & One Hot Encoding
    07:35 Parameters
    09:59 eXtreme Gradient Boosting Model
    11:51 Error Plot
    16:50 Feature Importance
    18:00 Prediction and Confusion Matrix - Test Data
    24:03 More XGBoost Parameters
    Includes,
    - Packages needed and data
    - Partition data
    - Creating matrix and One-Hot Encoding for Factor variables
    - Parameters
    - eXtreme Gradient Boosting Model
    - Training & test error plot
    - Feature importance plot
    - Prediction & confusion matrix for test data
    - Booster parameters
    R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Komentáře • 272

  • @Wissro
    @Wissro Před rokem +3

    Best video on the internet on XGBoost, you just saved my paper. Thanks a lot :)

    • @bkrai
      @bkrai  Před rokem +1

      You're welcome!

    • @nyasha767
      @nyasha767 Před rokem +1

      I agree 100% with you.

  • @Viewfrommassada
    @Viewfrommassada Před 5 lety +1

    Thanksssss a lot Prof! You sent me the link to this video and it REALLY helps. But just as someone suggested in the comment, the parameters in the model are very KEY and a much detailed explanation of them and the algorithm as a whole will REALLY REALLY be APPRECIATED too. I am blessed to be a subscriber of your videos!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for your comments and suggestion!

  • @abhinavmishra7786
    @abhinavmishra7786 Před 6 lety +6

    I got a much higher level of clarity in the concept of xgboost model and parameter usage with this video. Thanks a lot Sir

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for comments!

  • @amaanraza2704
    @amaanraza2704 Před 5 lety +2

    Hi Bharatendra, I derive a lot of value from your tutorial that strike the right balance between being simple yet very useful. Love them!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for your feedback and comments!

  • @kartikrayaprolu9076
    @kartikrayaprolu9076 Před 4 lety +1

    Such an elaborate explanation.
    Please keep posting such videos. They will be very useful for the community.
    I've benefitted a lot from this video.

    • @bkrai
      @bkrai  Před 4 lety

      Thank you, I will

  • @gavinwebster8737
    @gavinwebster8737 Před 4 lety +1

    Best clarity so far on XGBoost, it helped a lot in my final project and in learning more about this algorithm compared to GBM.

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @flamboyantperson5936
    @flamboyantperson5936 Před 6 lety +2

    Respect you sir. The kind of knowledge you are sharing from Massachusetts is very very helpful. Thank you so very much Sir.

  • @anderswigren8277
    @anderswigren8277 Před 6 lety +2

    You are an skillful tutor. Keep going on and Happy New Year!

    • @bkrai
      @bkrai  Před 6 lety

      Happy New Year 2018!

  • @vijaypalmanit
    @vijaypalmanit Před 6 lety +2

    Thank you so much, this is the video i have been looking for long, didn't find anything interested, you have explain everything in detail and its interesting too.

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for comments!

  • @anigov
    @anigov Před 6 lety +3

    Thank you Sir for making it so easy

  • @prahladbhat9516
    @prahladbhat9516 Před 3 lety +1

    This helped so much on a classification project I am doing. Much thanks!

  • @irbobable
    @irbobable Před 6 lety +2

    Fantastic tutorial, thank you!

  • @faisalmohammed672
    @faisalmohammed672 Před 4 lety +4

    Thank you for the tutorial.
    Given that you have a binary target, I was wondering why you haven't used objective='binary:logistic' and eval_metric = 'logloss'.
    Is there a downside to using "multi:softprob" for a binary classification problem when it is typically used for multiclass classification where n>2. Appreciate if you could help clarify this.

  • @tamaraabzhandadze2712
    @tamaraabzhandadze2712 Před 2 lety +2

    That was a very good tutorial! I wonder if and how we could use the cross validation for choosing the eta, gamma, iteration etc parameters. I would be happy to have any suggestions.

  • @mayoordhokia
    @mayoordhokia Před 3 lety +1

    After weeks of searching for videos on using XGB and predicting continuous variable, I could not find any decent videos... nor were any of them as well explained (and entertaining) as your videos. Please make one for the community? Best wishes from London, UK

    • @bkrai
      @bkrai  Před 3 lety +1

      Thanks for the suggestion and comments, I'm adding this to my list of future video.

  • @happylearning-gp
    @happylearning-gp Před 2 lety +1

    Thank you for this tutorial. Awesome. Step by step explanations made things much easier to understand

    • @bkrai
      @bkrai  Před 2 lety

      You're very welcome!

    • @bkrai
      @bkrai  Před 2 lety +1

      You may also find this useful:
      czcams.com/video/GmkHvDs0GG8/video.html

    • @happylearning-gp
      @happylearning-gp Před 2 lety +1

      @@bkrai Thank you very much

    • @happylearning-gp
      @happylearning-gp Před 2 lety +1

      @@bkrai Thank you very much
      when you find time kindly have a look at my channel on R. Everything is like a standalone application
      czcams.com/channels/DmEAmoLuyE0h61aGpthGvA.html

    • @bkrai
      @bkrai  Před 2 lety

      You are welcome!

  • @shadrackbadia1158
    @shadrackbadia1158 Před rokem +1

    Very easy to follow, no errors in code, just great.🤓🙂

    • @bkrai
      @bkrai  Před rokem

      Great to hear!

  • @upskillwithchetan
    @upskillwithchetan Před 4 lety +1

    Thank you sir! awesome explanation skills with depth of algo

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for your comments and finding it useful!

  • @sebastianvarela2190
    @sebastianvarela2190 Před 5 lety +1

    Hi Sir, your videos are great. Let me ask you question: I have read that it is possible to implement survival analysis (cox regression) with the XGboost package, indicating "survival:cox" as the learning task parameter. I haven't found any tutorial on this issue. Do you know if it is necesary to make an extra work? for example to specify the time variable in someplace else? Thanks in advance.

  • @harishnagpal21
    @harishnagpal21 Před 3 lety +1

    Thanks for the model. A big help for me.

    • @bkrai
      @bkrai  Před 3 lety

      Thanks for comments!

  • @angappanmaruthachalam3054

    your explanation is awesome !

  • @TheLoggic
    @TheLoggic Před 6 lety +2

    Cheers Amazing Video Mate!

  • @swathinandakumar415
    @swathinandakumar415 Před 3 lety +1

    Thank you ,Sir, for explaining the model so well. I am doing something similar with my data. How can I show the probabilities of predictors. (similar to the one in decision tree)

  • @dicksang2
    @dicksang2 Před 5 lety +2

    Very very informative. Thanks!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @bhavanabhardwaj5253
    @bhavanabhardwaj5253 Před 6 lety +6

    Hello Sir, can you please share an example where the response variable is continuous?

  • @user-if2ww7sv8x
    @user-if2ww7sv8x Před 6 lety +2

    thank you for your sharing.

  • @karimkardous5555
    @karimkardous5555 Před 6 lety +1

    we can also increase the range on the y axis by using the following lines
    plot(e$iter, e$train_mlogloss, col = "blue", type = "l", ylim = c(0, 1))
    lines(e$iter, e$test_mlogloss, col = "green")
    legend("topright", legend = c("Training Error", "Testing Error"), lty=c(1,1), col = c("blue", "green"))
    but i guess for the purposes of this video not using the ylim parameter can be intentional and warranted.
    Thank you for the great video as always

  • @manaspradhan2166
    @manaspradhan2166 Před 6 lety +2

    Thank you sir, This is very helpful

  • @musasall5740
    @musasall5740 Před 6 lety +3

    Excellent!

  • @93divi
    @93divi Před 6 lety +1

    thank you Sir...

  • @hans4223
    @hans4223 Před 5 lety +1

    Simply Awesome and excellent ..

    • @bkrai
      @bkrai  Před 5 lety +1

      Thanks for comments!

  • @SmartMrSteve
    @SmartMrSteve Před 3 lety +2

    Thanks for the amazing tutorial for the XGboost!. I can't believe that you make every application of machine learning so easy. I really want your help figure out applying XGboost on time-to-event data. There are so limited resources in terms of XGboost using cox model. Do you have any suggestions? thanks

    • @bkrai
      @bkrai  Před 3 lety

      I don't have at his time, but have added it to my list.

  • @hilaav7449
    @hilaav7449 Před 5 lety +1

    Thank you it was very helpful!!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @sheeqariff7974
    @sheeqariff7974 Před 5 lety

    Hi sir. Your video is very good and easy to understand. I have one question. What is the classifier algorithm used in the xgboost package for classification case? I had read some info in other website that the package includes "tree learning algorithms". Is it decision tree algorithm? thank you in advance for your clarification.

  • @hmachira1
    @hmachira1 Před 6 lety +2

    Thank you so much

  • @sebastianvarela2190
    @sebastianvarela2190 Před 5 lety +1

    Excelent video, thanks!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @tathagataghosh5390
    @tathagataghosh5390 Před 10 měsíci

    Sir, Can you please make a video on stacking model for different DL models. Thanks a lot for informative videos sir.

  • @sebastianvarela2190
    @sebastianvarela2190 Před 5 lety +1

    Hi Sir, Let me ask you a question. In a binary classification context, How do you predict when it is not possible to know the values of the target or outcome variable in a forecasting scenario? I mean you need to forecast a result and have a new dataset without the response variable, that is, you dont know if a student will be admitted or not, but need to make a prediction/using xgboost.
    I tried to do this by setting in the "test set" (the new dataset without the response variable) an outcome variable with a fixed value -0 for instance- to be able to run the xgboost, however the prediction is pretty unaccurate.
    Thanks very much!

  • @gurgenhovakimyan329
    @gurgenhovakimyan329 Před 4 lety +1

    Thank you very much. You helped me a lot.

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @evansumido6191
    @evansumido6191 Před rokem

    hi sir. what line of code will i add if i want to see the confusion matrix that will also display 95% CI and Test P-value? great lecture. thank you.

  • @anshagarwal7020
    @anshagarwal7020 Před 4 lety +1

    Thank you for the tutorial..Really helped in understanding. I have a question why can't we do dummy encoding for categorical variables in xg boost??

    • @bkrai
      @bkrai  Před 4 lety

      You may try. It should work fine.

  • @chaitanyakmr
    @chaitanyakmr Před 5 lety +1

    thanks a lot for the explanation.

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @MSS864
    @MSS864 Před 3 lety +2

    I am enjoying watching your videos starting from the simplest to more complicated ones! Thank you Dr. Rai for your great explanation. I have one question, though: When you divide the data into train and test data, you are using data[ind==1, ] and data[ind==2, ]; it is not clear to me how this magically works; however, what I see is data[x, y], where the only values that y can take are blank, and integers from 1 to 400, and the only values y can take are blank, and integer values from 1 to 4. Can you explain to me what is going on? Or, is there any thing that I am missing?

    • @bkrai
      @bkrai  Před 3 lety

      You can refer to this for explanation:
      czcams.com/video/RBojq0DAAS8/video.html

  • @jairjuliocc
    @jairjuliocc Před 5 lety +1

    Very useful , thank you!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @jojo23srb
    @jojo23srb Před 5 lety +1

    Thanks for the video!
    A quick question though: What's the motivation behind the 'prob' vector in 'ind

    • @bkrai
      @bkrai  Před 5 lety +1

      prob is the probability. For more details about data partitioning, you can look at this link:
      czcams.com/video/aS1O8EiGLdg/video.html
      Also date variables are handled differently. Probably I'll do a video about it later.

  • @ft753
    @ft753 Před 4 lety +1

    Thanks very much for this tutorial - definitely made things easier to understand.
    I have a question regarding "objective" = "multi:softprob" in the parameter section. The admission problem in the example deals with a logistic problem, right? So why should we use multi:softprob instead of binary:logistic? If I try the model with this binary:logistic input my models fails.
    Would be great if you could help me out on when to use what objective! Thanks.

    • @bkrai
      @bkrai  Před 4 lety

      Multi works for 2 or more levels.

  • @gowrikaruppusami7757
    @gowrikaruppusami7757 Před 4 lety +1

    very excellent explanation lot of thanks
    i have one doubt it is possible to use image data specially satellite data

    • @bkrai
      @bkrai  Před 4 lety

      For image data deep learning is more effective. You can explore ‘deep learning’ playlist on this channel.

  • @datascience8272
    @datascience8272 Před 6 lety

    Hello Sir,
    In a real scenario, where we have a separate test data with no dependent variable. How will the sparse.matrix.model work?

  • @tadessemelakuabegaz9615

    Dear Rai, I hope you doing well. I have 1 question. I am doing a machine learning model using the RandomForest and XGBoost algorithms. My data is a survey of samples derived from a large population. My data has a sampling weight which is the number of individuals in the population each respondent in the sample is representing. How can I apply this sampling weight in my ML model? The data also contains strata and clusters. Do I have to keep the sampling weight, strata, and cluster variables with my features?

  • @jamesstevenson5002
    @jamesstevenson5002 Před 6 lety +2

    Thanks a lot

  • @OrcaChess
    @OrcaChess Před 5 lety +1

    Thank you so much for your instructive and insightful tutorial!
    I've one question:
    Do I only need one hot encoding for my inputs / features?
    What about the outputs, is xgboost able to forecast a categorical variable as a label?
    Or should I make one hot encoding for my labels as well?
    Kind regards
    Jonathan

    • @bkrai
      @bkrai  Před 5 lety +1

      For XGBoost, response variable also needs to be numeric. In the example that I used, admit is a factor variable but since it has two values 0 and 1 in numeric form, we didn't do anything. For further explanation about variables, you can also refer to this link:
      cran.r-project.org/web/packages/xgboost/vignettes/discoverYourData.html

    • @OrcaChess
      @OrcaChess Před 5 lety

      Thank you very much for your explanations and the link!
      What is in your opinion in multi class cases more suitable - Suppose we have one categorical variable with 10 classes (0 to 9) every number is a class :
      What do you think is better?
      1. Make one model to forecast this categorical variable -> getting 10 different probabilities which sum up to 1.
      2. Make 10 different models which forecast for each of the 10 classes yes or no (0 an 1).
      In the end we take the model with the highest probability for the yes-case as the forecast
      Thanks in advance
      Jonathan

  • @mecobio2
    @mecobio2 Před 5 lety +1

    The code has room for improvement. For instance, in the splitting of the data, instead of using sample(), you can use createDataPartition() instead, in order to preserve the proportion of the categories in Y variable. The improments goes from 0.7066667 to 0.7375.
    Another improvement is to used, say, 10 fold cross validation instead, and used caret R-package with train()

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for sharing!

  • @happylearning-gp
    @happylearning-gp Před 2 lety

    I have a basic question. in logistic regression using lm function, we get model with predictors considered in that. but here, I don't know which are the predictors considered in the bst_model. could you please guide me to extract those predictors from the bst_model. Thank you very much

  • @ConsuelaPlaysRS
    @ConsuelaPlaysRS Před 5 lety +2

    Thank you! I wish you would use caret more, though.

    • @bkrai
      @bkrai  Před 5 lety +1

      Thanks for the suggestion!

  • @tadessemelakuabegaz9615
    @tadessemelakuabegaz9615 Před 2 lety +2

    I have seen your lecture on logistic regression and randomForest as well. They are awesome. Do we require cross-validation in these ML methods? I haven't observed any cross-validation step in your lecture on LR, RF, and xgboost.

    • @bkrai
      @bkrai  Před 2 lety +1

      I've split data in to train and test. But no harm in doing CV.

  • @seshasaiguna9937
    @seshasaiguna9937 Před 5 lety

    can we use xgboost and adaboost for multiclass models?
    When using 'adaboost' I'm getting the following error
    "Error: Dependent variables must have two levels"
    My dataset has 3 levels. You inputs will be helpful and appreciated!

  • @gmm552
    @gmm552 Před 3 lety

    How do I interpret cover etc? Also how can we do grid search here for optimisation?

  • @SaranathenArunE
    @SaranathenArunE Před 5 lety +1

    thanks sir and brilliant

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for feedback!

  • @OrcaChess
    @OrcaChess Před 5 lety +1

    Hello,
    is it possible to change the cutoff of the XGB-Model prediction?
    In my model evaluation phase I got the case where the AUC in my ROC curve of a model to another model was higher
    despite of a clearly worse confusion matrix and accuracy. My guess is that this could be a cutoff issue.
    Kind regards
    Jonathan

    • @bkrai
      @bkrai  Před 5 lety

      ROC curve already makes use of various cutoffs to draw the curve. With one cutoff value we will just get one point and not a curve. Looking at two curves can give you better idea about the reasons behind AUC difference.

  • @akd9977
    @akd9977 Před 5 lety +1

    Thank you for explaining clearly. If I have five character indpendent variable in the dataframe and I don't want to drop it, How can I proceed with this concept. It means how the character would be converted to numeric data

    • @bkrai
      @bkrai  Před 5 lety

      You can do one-hot encoding as shown in the video.

  • @deannanuboshi1387
    @deannanuboshi1387 Před rokem +1

    Great video! Do you know how to get confidence or prediction interval for xgboost in r? Thanks

    • @bkrai
      @bkrai  Před rokem +1

      You can get more details here:
      czcams.com/video/hCLKMiZBTrU/video.html

  • @popezee2029
    @popezee2029 Před 5 lety +1

    Thanks for the instructive video Sir. I am using a test set that does not contain the dependent variable row because i am supposed to predict that column in a regression problem. How should i edit the script for test_label and watchlist? Thank you.

    • @bkrai
      @bkrai  Před 5 lety +1

      You can try this:
      new_matrix

  • @liwenling1287
    @liwenling1287 Před rokem

    Thanks Rai for your help tutorial! It really helps me to understand and do XGBoost in R. Here I have a question, if I want to do with the regression problem, can I use the same code? or any parameter should I modify? Hope to hear from you soon.

    • @bkrai
      @bkrai  Před rokem

      You can see an example here:
      czcams.com/video/hCLKMiZBTrU/video.html
      You can also get some practice by doing this competition:
      czcams.com/video/Dn028hqWnUA/video.html

    • @liwenling1287
      @liwenling1287 Před rokem +1

      @@bkrai , really helpful! Thanks again for your detail tutorial. Wish you all the best!

    • @bkrai
      @bkrai  Před rokem

      You are welcome!

  • @abhibhavsharma8706
    @abhibhavsharma8706 Před 4 lety +1

    Thankyou Sir,
    Please also give a guidance about how to install the package LightGBM in R and its uses

    • @bkrai
      @bkrai  Před 4 lety +1

      Thanks, I've added it to my list.

  • @WhySoSkyHigh
    @WhySoSkyHigh Před 5 lety +1

    absolute legend!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @harishnagpal21
    @harishnagpal21 Před 5 lety +1

    Thanks for the video. In what scenario we should use eXtreme Gradient Boosting!

    • @bkrai
      @bkrai  Před 5 lety

      You can use it for better accuracy and faster run compared to many other methods.

    • @harishnagpal21
      @harishnagpal21 Před 5 lety +1

      thanks a lot :)

  • @jjohn108
    @jjohn108 Před 3 lety +1

    Great tutorials :)

    • @bkrai
      @bkrai  Před 3 lety

      Thanks for comments!

  • @shikevin3362
    @shikevin3362 Před 4 lety +1

    you are a legend!!!

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @adarsha1981
    @adarsha1981 Před 6 lety +1

    Hi Bharatendra, nice and veryuseful video.. i have a question.. in my case i have around 4.5 lacks observations and 250 features.. am trying to run XGBoost, its taking some time, thats ok. but not able to remove the XG boost... Note: my data is highly class imbalanced where 0's 75% and 1's 25%.. do you suggest to use XGBoost here? thanks !

    • @bkrai
      @bkrai  Před 6 lety

      I would suggest take care of class imbalance problem (CIP) before running XGBoost. It will improve accuracy significantly. Here is the link for CIP:
      czcams.com/video/Ho2Klvzjegg/video.html

  • @saipri
    @saipri Před 2 lety

    Is there a video for checking the model using chi-square?

  • @supriyashinde5128
    @supriyashinde5128 Před 4 lety +1

    Thank you so much for the tutorial.
    I have a question
    How to plot ROC and AUC curve on the same data set. Can you provide the code for ROC and AUC curve.

    • @bkrai
      @bkrai  Před 4 lety

      Here is the link:
      czcams.com/video/ypO1DPEKYFo/video.html

  • @OrcaChess
    @OrcaChess Před 5 lety +1

    Hello Bharatendra Rai,
    did you make a video about setting up a feature selection in R?
    It would be very useful for the case if you have lots of features / inputs and you want to find out
    which of these features are relevant to determine a feature subset for the classifier.
    Kind regards
    Jonathan

    • @bkrai
      @bkrai  Před 5 lety +2

      I'll be doing one in August.

    • @OrcaChess
      @OrcaChess Před 5 lety +1

      Bharatendra Rai Looking forward to it! 👍 Thank You for your Deep and to the point Data Science tutorials - I recommend it in Karlsruhe every student who wants to run ML models in R.

    • @OrcaChess
      @OrcaChess Před 5 lety +1

      Bharatendra Rai Looking forward to it! 👍 Thank You for your Deep and to the point Data Science tutorials - I recommend it in Karlsruhe every student who wants to run ML models in R.

    • @OrcaChess
      @OrcaChess Před 5 lety

      Bharatendra Rai Looking forward to it! 👍 Thank You for your Deep and to the point Data Science tutorials - I recommend it in Karlsruhe every student who wants to run ML models in R.

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for your comments and recommendations!

  • @nguyenphananhhuy416
    @nguyenphananhhuy416 Před 6 lety

    Is there any example of using XGboost to make prediction ? It seems that this video is for the classification case.

  • @eliecerecology
    @eliecerecology Před 5 lety +1

    Thanks for the video. I have a question why did not you use objective" = "binary:logistic"?

    • @bkrai
      @bkrai  Před 5 lety

      Yes, that should be more appropriate.

  • @supriyashinde5128
    @supriyashinde5128 Před 4 lety

    Hello Sir, Can we add hyper parameter tuning in XGBOOST. If yes, then how

  • @shinuignatious308
    @shinuignatious308 Před 5 lety +1

    Thank you so much sir for your in-depth tutorials. Sir could u please post github link for the code as well.?

    • @bkrai
      @bkrai  Před 5 lety

      Link to the code is in the description area below the video.

    • @bkrai
      @bkrai  Před 4 lety

      Link ti GitHub: github.com/bkrai/Top-10-Machine-Learning-Methods-With-R

  • @zacs7971
    @zacs7971 Před 4 lety +2

    Hello Professor, thank you for this video.
    I'm receiving this error after attempting to assign the same line of code you have in line 22. Any ideas on how to resolve?
    Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) : The length of labels must equal to the number of rows in the input data

    • @bkrai
      @bkrai  Před 4 lety

      Following provides some clue "length of labels must equal to the number of rows in the input data".

  • @vishwajitsen1434
    @vishwajitsen1434 Před 5 lety +1

    Can you please upload videos LSTM in Keras in R for numerical categorical and multiclass outcomes....it would be really great

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for the suggestion! It's on my list for future videos.

  • @Viewfrommassada
    @Viewfrommassada Před 5 lety +1

    Also Prof Rai, I am building an Ensemble model of Random Forest and Xgboost with R. My response variable has 2 levels 'Low' and 'High'. The response variable's scale is a factor in R. Without converting these '0's and '1's, can I build the model? Also, some of my predictor variables have levels A, B, C, D and E and their scales as detected my R are factors. Do I have to convert these to Zeros and Ones numbers even though they are factors before I use them?

    • @bkrai
      @bkrai  Před 5 lety

      When you use random forest, you do not need to convert categorical independent or dependent variable to numeric. But you definitely need numeric variable when using xgboost.

    • @Viewfrommassada
      @Viewfrommassada Před 5 lety

      Your explanation helped a lot. Thanks. I am building an ensemble of Random Forest and Xgboost on a classification problem. I have imbalanced data so used your video to balance ONLY my training data. (I hope that's all that I need to do in terms of the balancing?). After balancing, I applied your One-Hot encoding tutorial on both my balanced Train data and my unbalanced Test data. My Xgboost is running well though I am yet to test it. BUT the problem is the Random Forest. When i pass the data through the RF I get the error message below::::
      Error in t.default(x) : argument is not a matrix
      In addition: Warning messages:
      1: In randomForest.default(x, y, mtry = mtryStart, ntree = ntreeTry, :
      The response has five or fewer unique values. Are you sure you want to do regression?
      2: In is.na(x) :
      is.na() applied to non-(list or vector) of type 'externalptr'
      What could be the solution to it? Your help is greatly be appreciated, Prof Rai!

  • @Didanihaaaa
    @Didanihaaaa Před 6 lety +1

    First I should appreciate for providing such helpful educational channel. Thanks a lot Sir. Kindly I have a question regards factor parameter.
    Should I turn all integer values to Factor? cuz I got an error that " xgb.DMatrix(data = as.matrix(train), label = train_label) :
    REAL() can only be applied to a 'numeric', not a 'integer'"?
    Could you please explain how did you choose the rank column to turn into the Factor and matrix variable?
    Best Regards,

    • @bkrai
      @bkrai  Před 6 lety +1

      I used rank as an example for dealing with factor variables. In your dataset if you have any factor variable, you can handle it in a similar manner.

  • @ramp2011
    @ramp2011 Před 5 lety +1

    Would you consider using caret and calling xboost there directly? Is there a benefit from using this direct method versus using caret? Thank you

    • @bkrai
      @bkrai  Před 4 lety

      That should also work fine. As long as we use the same method, model performance is not likely to be significantly different.

  • @tadessemelakuabegaz9615
    @tadessemelakuabegaz9615 Před 2 lety +1

    Hi Rai. Hope everything is going good. I am currently working on an ML algorithm with a continuous outcome variable. I am new to a regression model. I want to develop randomForest and XGBoost regression. Can I ask for any reference video and codes related to a regression algorithm using RnadomForest and XGBoost

    • @bkrai
      @bkrai  Před 2 lety

      Refer to:
      czcams.com/video/hCLKMiZBTrU/video.html

  • @adarsha1981
    @adarsha1981 Před 6 lety +1

    Hi Bharatendra.. i tried searching Bagging/Boosting and SMOTE videos from your playlist.. aren't they out yet? if not yet , waiting to see them :)..

  • @tadessemelakuabegaz9615
    @tadessemelakuabegaz9615 Před 2 lety +1

    Hi Rai. Great job. I have one question. How can we construc ROC&ACU for the XGBOOST model

    • @bkrai
      @bkrai  Před 2 lety +1

      See if this help. It has more detailed coverage:
      czcams.com/video/ftjNuPkPQB4/video.html

    • @tadessemelakuabegaz9615
      @tadessemelakuabegaz9615 Před 2 lety +1

      @@bkrai Thank you so much

    • @bkrai
      @bkrai  Před 2 lety

      You are welcome!

  • @nithinmamidala
    @nithinmamidala Před 5 lety

    Please give an explanation about the algorithm so that its helpful to understand much better

  • @Viewfrommassada
    @Viewfrommassada Před 5 lety +1

    Hi Prof., I have come again with a question since I am learning a lot with your videos. Could you please explain very well the 'eta' parameter in xgboost and also I want to report the AUC metric in my xgboost model and I need your guidance. I have seen examples on google but I get error when i try. I am making a presentation on xgboost soon. Your help will be appreciated.

    • @bkrai
      @bkrai  Před 5 lety

      eta is the learning rate. When is is high, computation is faster, but you may miss the optimum. When it is low, computation is slower, but there is a better chance of hitting the optimum. Depending on the data size and problem, we try various values to explore what is best for a given problem. For AUC you can try this:
      czcams.com/video/ypO1DPEKYFo/video.html

  • @kartikrayaprolu9076
    @kartikrayaprolu9076 Před 4 lety +1

    Hi Sir,
    Why have you used "-1" in the sparse.model.matrix" function?
    Does it specify that the "first column" is not to be included or does it not include only one column i.e. the "response" variable?

    • @dhavalpatel1843
      @dhavalpatel1843 Před 4 lety +2

      No. of classes are 2 so If we put -1 those classes will become 0 and 1 because in this case 0 is for not admitted and 1 is admitted

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for the update!

    • @bkrai
      @bkrai  Před 4 lety +1

      here is an update: “-1” removes an extra column which this command creates as the first column.

  • @phuongk.kttp-mtnguyenkieul2761

    Thank you for your valuable video. I have a question in bst_model step, it is not work. My data has number of class is 122. When I run, R result displays error: label must be in [0, num_class). I try so many nrounds value in range 0 and 122, but haven't worked. Hope to get your response. Many thanks!

    • @bkrai
      @bkrai  Před 4 lety +1

      I think 122 is too many classes. make sure you have enough data for each class otherwise there could be issues.

    • @phuongk.kttp-mtnguyenkieul2761
      @phuongk.kttp-mtnguyenkieul2761 Před 4 lety +1

      @@bkrai Do you have any solution to handle, Dr.?

    • @bkrai
      @bkrai  Před 4 lety

      Difficult to say much without looking at data

  • @nabaafrin9137
    @nabaafrin9137 Před 5 lety +1

    can you please tell me which editor you used ?

    • @bkrai
      @bkrai  Před 5 lety

      I use final cut pro.

  • @harishnagpal21
    @harishnagpal21 Před 3 lety +1

    I have on query. Here in this example we are aware about response variable in test set as we have divided actual data into 80/20. But in actual life like in Kaggle competitions we need to predict on Test set given by Kaggle where we need to predict on Response variable. So how that will fit into above code. ie how to do prediction on actual Test set in xgboost. Thanks in advance.

    • @bkrai
      @bkrai  Před 3 lety

      This code will not change much. But you will definitely have to make some adjustments before you can correctly submit your file on Kaggle. You can refer to this example:
      czcams.com/video/4ld-ZfrCc0o/video.html

  • @jojo23srb
    @jojo23srb Před 5 lety +1

    Q: what's stopping someone from just changing all their variables to numeric types and skipping over the one-hot encoding process altogether? Does it hurt the prediction?

    • @bkrai
      @bkrai  Před 5 lety

      I would suggest try both and compare results.

  • @rachelfan4664
    @rachelfan4664 Před 5 lety +1

    Hi Rai, my test data doesn't have response variables, I need to predict them. What should I do with all the test_matrix stuff?

    • @bkrai
      @bkrai  Před 5 lety

      You can artificially create it and fill with zeros.

    • @rachelfan4664
      @rachelfan4664 Před 5 lety

      Bharatendra Rai thanks sir, will try

  • @foram224
    @foram224 Před 4 lety +1

    I have one question, if you have created sparse matrix for train and test set then why are you using as.matrix for trainm in xgb.DMatrix? sparse matrix is also you can directly use. I am confused in xgb.DMatrix and before the step which is sparce.model.matrix.
    Another question I have, what if your responce variable is in position of 43 not 1 then still you have to use -1 in sparse matrix.?
    Thanks you so much for video its really nice but I have just questions depends on my dataset. Hopping for your reply. thanks.

    • @bkrai
      @bkrai  Před 4 lety

      For the 1st question, I would suggest try and see if it works. If it works then you are fine.
      I didn't fully understand 2nd question. Are you referring to code line 43?

    • @foram224
      @foram224 Před 4 lety

      ​@@bkrai I appreciate your reply. For my data set if I use as.matrix on sparce.model.matrix than it was giving me an error. So, I am better using only sparce.model.matrix varibale directly in xgb.DMatrix. That is all clear now. you are getting mlogloss but I was getting merror. I used same parameters as yours.

  • @upskillwithchetan
    @upskillwithchetan Před 4 lety +1

    Hi Sir, I have confusion @4:18 you have mentioned that put -1 because "Admit" is first column in dataset but according to this blog www.analyticsvidhya.com/blog/2016/01/xgboost-algorithm-easy-steps/ - “-1” removes an extra column which this command creates as the first column.
    please confirm

    • @bkrai
      @bkrai  Před 4 lety

      You are right. Once 'admit' is there before ~ symbol, it is automatically out.

  • @gtmpai
    @gtmpai Před 4 lety

    I am not able to get $evaluation_log in bst_model. Is there anything i am missing

  • @utkarshprajapati9876
    @utkarshprajapati9876 Před 5 lety +1

    Hi Sir, nice and very useful video sir I want to ask when I use XGBoost algorithm then I do not need to use linear and logistic regression?

    • @utkarshprajapati9876
      @utkarshprajapati9876 Před 5 lety +1

      I want to use XGBoost algorithm in this problem. www.kaggle.com/c/house-prices-advanced-regression-techniques

    • @bkrai
      @bkrai  Před 5 lety +1

      It's better to try more methods and then see wgich one performs better.

    • @utkarshprajapati9876
      @utkarshprajapati9876 Před 5 lety +1

      @@bkrai okay sir thanks.

    • @utkarshprajapati9876
      @utkarshprajapati9876 Před 5 lety +1

      @@bkrai Sir u r really great man.

  • @haanda47
    @haanda47 Před 6 lety +1

    Sir, can you please upload an video for Adaptive boosting in R. Thanks in Advance.

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for the suggestion, I've added it to my list.

  • @fantomraja9137
    @fantomraja9137 Před 5 lety +1

    thnx so much.....

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @missakboyajian6446
    @missakboyajian6446 Před 6 lety +2

    Hi Thanks for the video. I have a problem I think. When I do feature importance I am getting the target column also with it. My target column is 'dismissed' and I put it the first column. This is how i am loading it.
    train

    • @bkrai
      @bkrai  Před 6 lety

      I think lines 3 to 6 is not needed.

  • @navdeepagrawal7819
    @navdeepagrawal7819 Před rokem +1

    Sir, how we can optimize hyperparameters in the case of xgboost algo?

    • @bkrai
      @bkrai  Před rokem +1

      Refrr to this:
      czcams.com/video/GmkHvDs0GG8/video.html