XGBoost Model in Python | Tutorial | Machine Learning

Sdílet
Vložit
  • čas přidán 4. 09. 2024

Komentáře • 43

  • @mattc3738
    @mattc3738 Před 3 lety +25

    How does this not have more views!? Excellent video, EXACTLY what I needed to finish my project at work. This video could have saved me 10 hours of head scratching if I had seen it sooner.

    • @harsh1kumar
      @harsh1kumar  Před 3 lety

      Thanks Matt. I am glad to know that the video helped.

  • @ThePaintingpeter
    @ThePaintingpeter Před rokem

    I cannot overstate the fact that this video is really clear and terrific. Absolutely fantastic effort on your part. Thank you very much for doing this

  • @lxkhati4272
    @lxkhati4272 Před 2 lety +1

    all the advanced terms are simply described. Thanks, Harsh.

  • @dehumanizer668
    @dehumanizer668 Před 2 lety +1

    Exactly what I needed. Explained very clearly. Thank You.

  • @mosherchtman
    @mosherchtman Před rokem

    More videos [like this] that teach optimization of all the parameters in the model, please

  • @kiranchowdary8100
    @kiranchowdary8100 Před 3 lety +1

    Good video sir , Thanks for making videos and educating us

  • @riskamulliani3390
    @riskamulliani3390 Před 5 měsíci

    Thank you sir🙏, vidio ini sangat membantu 😊

  • @harshchoudhary279
    @harshchoudhary279 Před rokem

    This video covers a lot of thing in short time

  • @saisarath623
    @saisarath623 Před rokem

    Really nice video and explanation Harsh

  • @alexandergawrilow6255
    @alexandergawrilow6255 Před 3 lety +3

    Thank you for the great content.
    I'm wondering why don't you use early_stopping_rounds during grid search? That way you could set num_trees to a fixed big number (like you did later when building the final model) and don't have to grid search over it. Also, using your approach you probably overfit during grid search (due to the high number of estimators) and only get the best parameters when using all of the 1000, 2000 or 3000 trees.
    In the final model, due to the fact that you use early_stopping_rounds, a different number of estimators will be used and therefore the optimal hyperparamters from the grid search are probably not the optimal hyperparameters for the final model. What do you think about it?

    • @harsh1kumar
      @harsh1kumar  Před 3 lety +1

      Hey Alexander, thank you for this good question. You are right, ideally we would want to use something like early_stopping_rounds during grid search. As far as I know, this feature is not available while performing grid search using sklearn. Grid search will check values of all the parameter combinations that have been specified.
      You are also right in stating that there will be difference in estimators that we get from grid search and from using early_stopping_rounds in the final model. I consider grid search as an initial estimate of what hyperparameters would give better results, but the final model can have slightly different values.
      Thank you for your interesting question :)

  • @Sam98961
    @Sam98961 Před 2 lety

    Thanks for the video! Great learning experience.

  • @MrLordmaximus
    @MrLordmaximus Před 2 lety

    This is a a very well explained video !

  • @romaljaiswal8
    @romaljaiswal8 Před rokem

    Disliking this video because it’s too good and I don’t want others to know abt it 😂😂

  • @LLoBBHa
    @LLoBBHa Před rokem

    Great video thank you!

  • @Islam101_Uganda
    @Islam101_Uganda Před 17 dny

    Thanks boss

  • @milanchetry1168
    @milanchetry1168 Před 6 měsíci

    eval_metric throws error, can anyone suggest me the reason?

  • @v1hana350
    @v1hana350 Před 2 lety

    How can parallelization work in the Xgboost algorithm? Please explain it with an example

  • @vbcsaransekar9058
    @vbcsaransekar9058 Před 3 lety

    I appreciate your effort.

  • @pradyutmazumdar1441
    @pradyutmazumdar1441 Před 2 lety

    i have a doubt……during cross validation where we choose which model to use i am getting some accuracy but after hyperparameter tuning the accuracy jumps by 2 %
    Is this normal?
    This is in XGboost

  • @v1hana350
    @v1hana350 Před 2 lety

    I have a question about the Xgboost algorithm. The question is how parallelization works in the Xgboost algorithm and explain me with an example.

  • @zetsuboulynn734
    @zetsuboulynn734 Před 2 lety

    Nice video! Thank you so much!
    One pair of doubts, is there a way to download the notebook with outputs from Kaggle? Is it possible to train models like XGBoost with GPU? because the last time I tried there, the debugger suggested that it was only possible with sequential models like neural networks.

  • @javohirxusanov1229
    @javohirxusanov1229 Před 2 lety +1

    Hey man, you doin' a good job! Why u stop making videos?

    • @harsh1kumar
      @harsh1kumar  Před 2 lety +2

      Thank you very much man. I will start uploading more videos from next month 😀

  • @vbcsaransekar9058
    @vbcsaransekar9058 Před 3 lety

    Really, awesome.

  • @fscode5021
    @fscode5021 Před rokem

    in my project only i get 45% in training and 44 in testing. What do you think i can do to get better accuracy please.

  • @henilshah6962
    @henilshah6962 Před 2 lety

    How do you do it for Multiclass classification?

  • @livesinging3924
    @livesinging3924 Před 3 lety

    Great content...

  • @ratishr6003
    @ratishr6003 Před 2 lety

    Thank you, this was explained really well. I'm working on a scorecard model with over 400 variables, can we use 'from xgboost import plot_importance' to print out the important features post hyper-parameter tuning and training the model and then re-run the model with subset features?

    • @shivankarora1264
      @shivankarora1264 Před rokem

      Hi
      I'm working on the same
      Please help me, with what approach you did
      Thanks

  • @anmol_seth_xx
    @anmol_seth_xx Před 2 lety

    The program is too too much time to run 😵
    But Thanks to you Sir, for explaining the program and arguments very well.

    • @harsh1kumar
      @harsh1kumar  Před rokem

      You can try LightGBM. It may be faster depending on your context. I have a video for it on my channel.

  • @AkshayDudvadkar
    @AkshayDudvadkar Před rokem

    Just wanted to know whether EDA, feature selection is not needed for XGboost ?

    • @harsh1kumar
      @harsh1kumar  Před rokem

      EDA should be done irrespective of the model. Feature selection can also help removing unnecessary complexity in the model. But the benefit for techniques like XGBoost is that it can take in large number of features and give importance to the relevant ones. I would advice doing first iteration with all possible features and then remove features with lower importance, while monitoring model performance metrics.

  • @saumyen1
    @saumyen1 Před 2 lety

    I have a question. What are the two classes here that are being separated.

    • @harsh1kumar
      @harsh1kumar  Před 2 lety

      We are trying to identify which customers will make a specific transaction in the future. These customers will be tagged as 1 in the data. For more details see here www.kaggle.com/competitions/santander-customer-transaction-prediction/overview

  • @saadarbani
    @saadarbani Před 2 lety

    where can i get the api of XGboost?

    • @harsh1kumar
      @harsh1kumar  Před rokem

      API Reference from Python: xgboost.readthedocs.io/en/stable/python/python_api.html
      For other languages, you can see the same website

  • @alishazel
    @alishazel Před rokem

    I think im the stupid one ... the video is in detail but i fail to do ... head scratching moment in spyder :(

  • @hunterlee9413
    @hunterlee9413 Před 2 lety

    where is the data?

    • @harsh1kumar
      @harsh1kumar  Před rokem

      You can access data from this link: www.kaggle.com/competitions/santander-customer-transaction-prediction/data

  • @CharanSaiAnnam
    @CharanSaiAnnam Před 2 lety

    Your justification for learning rate is not right.