How to Prune Regression Trees, Clearly Explained!!!

Sdílet
Vložit
  • čas přidán 8. 07. 2024
  • Pruning Regression Trees is one the most important ways we can prevent them from overfitting the Training Data. This video walks you through Cost Complexity Pruning, aka Weakest Link Pruning, step-by-step so that you can learn how it works and see it in action.
    NOTE: This StatQuest assumes you already know about...
    Regression Trees: • Regression Trees, Clea...
    ALSO NOTE: This StatQuest is based on the Cost Complexity Pruning algorithm found on pages 307 to 309 of the Introduction to Statistical Learning in R: faculty.marshall.usc.edu/garet...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    CZcams Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    0:59 Motivation for pruning a tree
    3:58 Calculating the sum of squared residuals for pruned trees
    7:50 Comparing pruned trees with alpha.
    11:17 Step 1: Use all of the data to build trees with different alphas
    13:05 Step 2: Use cross validation to compare alphas
    15:02 Step 3: Select the alpha that, on average, gives the best results
    15:27 Step 4: Select the original tree that corresponds to that alpha
    #statquest #regression #tree

Komentáře • 530

  • @statquest
    @statquest  Před 3 lety +74

    NOTE: To apply this method to a classification, replace SSR with Gini Impurity (or Information Gain or Entropy or whatever metric you are using).
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @speakers159
      @speakers159 Před 2 lety

      I read some reference books like Introduction to Machine learning and Hands on Machine learning, but I didn't find any details about Decision Trees, or lets say other methods too like you covered!! Could you suggest some more references to get more deeper understanding?

    • @speakers159
      @speakers159 Před 2 lety

      I really wanted to try out the offline material too, but I'm still a student and can't afford it

    • @speakers159
      @speakers159 Před 2 lety

      And, Thanks a lot, this is really the best place for me to learn about machine learning as my first source, the explanations are both deep yet still accessible to a beginner

    • @pradeeptripathi7366
      @pradeeptripathi7366 Před 2 lety

      @StatQuest I am not able to understand how cost complexity pruning will work with classification. You are saying replace SSR with entropy or Gini index but how it will be calculated for leaf nodes. in SSR we are taking difference between actual and predicted (from leaf node), square it and sum it. In classification, we have predicted classes at leaf bode. Should we use predicted classes or actual classes for calculating gini index or entropy. I am confused. Please let me know.

    • @statquest
      @statquest  Před 2 lety +1

      @@pradeeptripathi7366 We use predicted and actual classes to calculate gini. For details, see: czcams.com/video/_L39rN6gz7Y/video.html

  • @andyjiang8988
    @andyjiang8988 Před 4 lety +202

    The channel with the lowest Gini score of likes vs. dislikes

    • @statquest
      @statquest  Před 4 lety +42

      You get a DOUBLE BAM for that comment. Funny! :)

  • @NIHIT555
    @NIHIT555 Před 4 lety +94

    Josh: you have truly cracked how to use technology(slides/basic animation) to change the way we are teaching for decades. I wish all universities take a note from you and revise the way they are teaching

    • @statquest
      @statquest  Před 4 lety +3

      Thank you very much! :)

    • @yogeshbharadwaj6200
      @yogeshbharadwaj6200 Před 3 lety +2

      yes, 100% true, lot of knowledge sharing just with simple visualisation, as you mentioned, way of teaching matters a lot.....Tks to our MASTER Josh Starmer once again for this awesome video/content !!!

    • @sagargoswami998
      @sagargoswami998 Před 2 lety +3

      @@statquest I can confirm this. I am pursuing a Masters Degree in Data Analytics Engineering. And I have this course that is giving me a headache: Statistical Modelling.
      Your videos have helped me a lot, this is the the perfect stuff I was looking for. BAM!!!! BTW, I saw the video where you explained how your pop helped you and the StatQuest community in general, I totally get it why your videos are perfect. Double BAM!!!!!
      Seriously man, thanks a ton.....

    • @Moiez101
      @Moiez101 Před rokem

      seriously yes! I'm taking an online course from MIT....brilliant faculty but just oh so removed from the every day regular experience of learning as a student. Their slides, and teaching methods leave much to be desired. I don't understand anything they are teaching in stats or ML but Starmer is saving my life atm.

  • @gayathrigirishnair7405
    @gayathrigirishnair7405 Před 5 měsíci +5

    This is the best explanation of regression trees that I could find online. Professors are always too mathematical and programmers are too practical. You're explanation is juusssst right. Thanks a bunch for this!

  • @alecvan7143
    @alecvan7143 Před 4 lety +11

    Re watching after practising I can even further appreciate the quality of your explanations, thanks Josh :)

    • @statquest
      @statquest  Před 4 lety

      Thank you very much! Good luck with your practicing. :)

  • @tymothylim6550
    @tymothylim6550 Před 3 lety +6

    Thank you very much for this video! I really enjoyed the full step-by-step process of building the various trees using different alpha values and the use of cross validation to select the best alpha!

  • @jiaxuzhang3527
    @jiaxuzhang3527 Před měsícem +2

    Absolutely brillian videos!!! I watched everything from the 1st one to this one in the list and understood so many things that I never understood in schools. I love your videos so much!

  • @eramitjangra4660
    @eramitjangra4660 Před 3 lety +1

    I searched lot of thing for my project on ML to start from scratch.
    Then i landed here
    You nailed it. 🔥🔥🙏
    Now i am on edge of completing my project
    thank a lot

    • @statquest
      @statquest  Před 3 lety

      Awesome! Glad my videos are helpful. :)

  • @triplefruition
    @triplefruition Před 4 lety +7

    Oh my Buddha!
    I'm falling in love with funny of your voice when you're explaining.
    Before I met your channel, my head is spinning round and round.
    I don't know what to do with my learning, but you came in and took me by big surprise.
    You made the abstract concept to be simple!
    Thank you!!!!! :)))))))

  • @aop2182
    @aop2182 Před 4 lety +11

    This is awesome. I remember I was a bit confused when I was reading tree based methods in An Introduction to Statistical Learning. This really helps me understand it much easier when I can visualize it other than read some formulas. Thank you!

    • @statquest
      @statquest  Před 4 lety

      Hooray! I'm glad the this video is helpful. :)

    • @vipul5340
      @vipul5340 Před 4 lety +1

      I am reading the same book now and without this video series it's impossible to understand.

  • @pushkarparanjpe187
    @pushkarparanjpe187 Před 4 lety +1

    So good! Consistently high quality across across videos and time! Keep going. Many thanks!

  • @gunupurugirija7201
    @gunupurugirija7201 Před 6 měsíci +2

    lol, that intro one who watches friends and Stat Quest would get it!! love your content its the best machine learning tutorials available

  • @munawersheikh00
    @munawersheikh00 Před 4 lety +5

    I love the way you say "BAMM"!!! Gives great relief during the video :) I want to say your style of teaching is great. The way you are explaining is making very easy for us to understand. In my opinion I can say "A difficult subject with easy to understand using your video lectures!" Thank you very much.

  • @enlighteninginformation7647

    This video really helped me to clearly understand the concept. Thank you for this good work

  • @preranadas4037
    @preranadas4037 Před 3 lety +1

    Best video on pruning and tree selection till date!!!!!

  • @LQNam
    @LQNam Před 2 lety

    Thanks, Josh Starmer. The way of using train + test data to find a list of alpha, then use K-fold CV on train data to find out the optimal alpha leads to the data leakage.

  • @mujeebrahman5282
    @mujeebrahman5282 Před 4 lety +3

    The good thing about his videos is you just have to watch any video once and the concept will not leave you for a long time.

  • @pratyanshvaibhav
    @pratyanshvaibhav Před rokem +3

    sir i am learning ML from your videos and everyday i am forced to comment expressing the beauty with which the concept is explained..and the best part is you still clear our doubts even after 3 years..for those who don't know sir has also written a book which is too good

    • @statquest
      @statquest  Před rokem +1

      Thank you very much! :)

    • @pratyanshvaibhav
      @pratyanshvaibhav Před rokem

      Sir can i connect with you on linkedin

    • @statquest
      @statquest  Před rokem +2

      @@pratyanshvaibhav Linkedin limits the number of connections anyone can have, and I've hit that limit.

    • @pratyanshvaibhav
      @pratyanshvaibhav Před rokem +1

      Okay sir

  • @dengzhonghan5125
    @dengzhonghan5125 Před 3 lety +1

    Thanks, Josh, every time I watch your video, I feel like the concept is very easy to understand! lol

  • @thanhtungnguyen7500
    @thanhtungnguyen7500 Před 4 lety +1

    thank you Josh for your very easy understanding explanation & lovely rhythm

  • @ldk5007
    @ldk5007 Před rokem +1

    What a beautiful content!
    I'm not an English speaker, but His video is more helpful than the Korean lecture provided by the college I attending.

  • @hellochii1675
    @hellochii1675 Před 4 lety +2

    As always the clearest explanation! Thank you so much 😊 and BAM BAM BAM

  • @user-xn7qt3zl4m
    @user-xn7qt3zl4m Před 4 lety +2

    The Best! The Best! The Best! video I've ever seen about Tree Pruning.
    Thanks a lot. Now I got the concepts.
    BAMM!

    • @statquest
      @statquest  Před 4 lety

      Hooray!!! Thank you very much! :)

  • @apurvagupta6217
    @apurvagupta6217 Před 3 lety +2

    The way you explain..and the amount of effort you put in the videos is great. I have learnt a lot from you sir. I always feel so positive and motivated while learning from you. Thank a lot🎈

  • @subashp7925
    @subashp7925 Před 8 měsíci +1

    Excellent explanation, you are the master in teaching, thank you so much much for your valuable effort

  • @definitelynosebreather
    @definitelynosebreather Před 3 lety +1

    Damn. I was finding only scientific articles and I was having trouble understanding the CCP, now you made it very clear! Thanks.

  • @gundamdhinesh5379
    @gundamdhinesh5379 Před 3 lety +1

    Really love your content. Must watch content for any learner

  • @wabsy845
    @wabsy845 Před 3 lety +3

    It is such a great explanation. Super helpful! Clear and fun! Really appreciate your time in making the video. Thank you!

  • @flaviofreire9323
    @flaviofreire9323 Před 3 lety +1

    Love the song in the beginning!!

  • @visheshkushwaha9446
    @visheshkushwaha9446 Před 4 lety +1

    Explanation is TRIPLE BAM!!!

  • @rashigupta1813
    @rashigupta1813 Před 3 lety +1

    BEST CHANNEL! no i m not just saying, i m shouting!

  • @BalistikJumbo
    @BalistikJumbo Před 4 lety +1

    As always awesome! Thank you Josh!!! Horraaaayyyy

  • @charlottel9534
    @charlottel9534 Před 4 lety +1

    Thank you so much for posting this!

  • @theblindcritic5876
    @theblindcritic5876 Před 4 lety +1

    This is brilliant work! Thanks a ton!

  • @dhananjaykansal8097
    @dhananjaykansal8097 Před 4 lety +50

    Ahhh Phobeee from Friends aka Smelly Cat. Haha good one Josh.

  • @marieblanchemanche9217
    @marieblanchemanche9217 Před rokem +1

    love the reference to Phoebe ! also thank you, all your videos are very helpful

  • @auzaluis
    @auzaluis Před rokem +4

    I don't know why I spend a lot of time googling if I always end up watching statquest haahahha

  • @chirags9774
    @chirags9774 Před 4 lety +2

    Best intro of all the statquest videos which I have seen 😍

  • @mathematicalninja2756
    @mathematicalninja2756 Před 4 lety +3

    This channel is gold mine i am telling ya :D
    can you cover box cox theorem (power transformations)

  • @amarakbar2374
    @amarakbar2374 Před 4 lety +1

    Wonderful explanation. Thank you very much

  • @rrrprogram4704
    @rrrprogram4704 Před 4 lety +1

    we love you DOSS.. hope me too will surely one day be a patreon member

  • @karannchew2534
    @karannchew2534 Před 3 lety

    (Making notes for my own future reference)
    Tree Score, SSR + αT, is used to create a set of Prune Trees.
    Then, apply data (cross-validationally), to all Prune Trees.
    The Prune Tree (and its α value) that give lowest SSR (with testing data set) is the winner.

  • @dongli7157
    @dongli7157 Před 4 lety +2

    Hi Josh, those explanatory vidoes are incredible. Thanks for the great work!

  • @yigithangediz2769
    @yigithangediz2769 Před měsícem +1

    excellent explanation. thank you

  • @carabidus
    @carabidus Před 4 lety

    Josh, the videos on this channel are nothing short of superb. I have only one suggestion: how about a dark theme for these presentations? That white background is like a supernova, especially on my 55" TV.

  • @user-bz8nm6eb6g
    @user-bz8nm6eb6g Před 2 měsíci +1

    Great explanation!

  • @Bennilenny
    @Bennilenny Před 4 lety +13

    Your intros make me smile :)

  • @user-jj3we9jv9i
    @user-jj3we9jv9i Před 8 měsíci +2

    Liked and Commented to help you with the CZcams algorithm.

  • @rrrprogram8667
    @rrrprogram8667 Před 4 lety +30

    There are some notifications... Right when it shows on ur phone... ur feelings says.. I going to learn something today...
    MEGAAA BAMMMMM

  • @arda8206
    @arda8206 Před 3 lety +2

    One good idea for cross-validation maybe is we can split data first to train and test and again split train to train and validation sets. Therefore, we can guarantee that our test set is totally new to environment. This will result in more realistic scores.

  • @knightedpanther
    @knightedpanther Před rokem +1

    Thank you so much for this amazing video. Very Amazing!

  • @longma7042
    @longma7042 Před 4 lety +1

    greate , thank you for your work . very clear

  • @hemlatasharma5288
    @hemlatasharma5288 Před 2 lety +1

    Hello Josh, Thanks for this amazing video, I am implementing cost complexity pruning on the basis of this video. Although I have one question: How do you build a decision tree using a particular value of alpha using the training data (during cross fold validation)?? How does alpha help?
    I am working on classification decision tree, here's what I do:
    1. Use all data to build full tree, get all subtrees and for every subtree get a value of alpha.
    Missclassification error of one subtree = sum of gini impurities of all leaf nodes
    2. Divide data into 10 folds, for each fold:
    - build decision using each value of alpha and training set. How? What role does alpha play here? I can grow a tree and then get subtrees without alpha
    - calculate test error (1-accuracy) for each subtree.
    - select subtree, represented by alpha having the lowest test error.
    3. selected alpha = avg alpha across all folds
    4. Pruned tree = tree that has the alpha = selected alpha
    I apologise if this is a stupid question :)

    • @statquest
      @statquest  Před 2 lety +2

      Say like the full sized tree has 12 leaves and we are restricting ourselves to building a tree with only 10 leaves. Which 2 leaves should we remove? Alpha helps answer that question. We want to remove the 2 leaves that will give us a better tree score than the original tree.

    • @hemlatasharma5288
      @hemlatasharma5288 Před 2 lety +1

      ​@@statquest Thank you Josh! My problem is solved. Thank you again for this great video. I hope you know that this is the ONLY resource out there that explains cost complexity pruning so nicely.

    • @statquest
      @statquest  Před 2 lety

      @@hemlatasharma5288 Thanks!

  • @heerbrahmbhatt6917
    @heerbrahmbhatt6917 Před 2 měsíci +1

    I'm 50% here for stats and 50% here for the sound effects!

  • @yulinliu850
    @yulinliu850 Před 4 lety +1

    Thanks Josh!

  • @JoRoCaRa
    @JoRoCaRa Před rokem +1

    you are awesome! clear! to the point!

  • @shashanksundi5669
    @shashanksundi5669 Před 3 lety +1

    Thank You !! Just perfect :)

  • @Rectalium
    @Rectalium Před 4 lety +2

    Your channel is amazing man. Great job

  • @stedev2256
    @stedev2256 Před 2 lety

    Hello,
    first of all thanks for the great material you produced and shared, certainly among the clearest and effective I've come across.
    My questions are about the cross-validation trees to determine the right alpha values.
    As a premise, if I understood correctly, we first determine candidate alpha values by :
    a) create a "full" tree from the full training+testing datasets
    b) produce the corresponding family of "pruned" versions (and I guess asses their SSRs in preparation for the next step) based on the morphology of the "full" tree (meaning, all possible pruned trees are considered - is that correct?)
    c) identify the candidate alpha values as those by which the "full" tree's score becomes higher than one of the pruned versions.
    Assuming the above is correct, when we move on to cross-validate in order to ultimately determine the right alpha, I understand that we resample a training set (and a corresponding test set) for a number of times.
    Each time, we build a new tree from the training set, and its associated set of pruned versions (let me call these tress a "cross-validation family of trees" (CVFTs)), and assess their SSRs based on the test set for the current round in order to contribute to ultimately calculate the actual alpha to use.
    First question: how come every CVFTs in your slides has a number of members that equals the number of candidate values for alpha?
    couldn't a resampled training set might give rise to trees with more or even fewer leaves - and corresponding pruned versions - than the tree that was used to identify the candidate alpha values? And in that case, the candidate alpha values might be in larger or smaller number than the possible number of trees in the CVFTs at hand.
    I imagine that a possible answer is that the number of members in a CVFTs can actually be different than the number of candidate alphas, and that the pruned tress in a CVFTs are actually identified through their Tree Scores when each of the alpha candidate values is applied -- and if so I guess the issue is that perhaps this mechanism does not stand out 100% from the presentation...
    Second question: if we assess the trees in each CVFTs only by their SSRs, wouldn't always the tree with more leaves (therefore alpha=0) win?
    Thanks much

    • @statquest
      @statquest  Před 2 lety

      What you wrote for b) "all possible pruned trees are considered" is not correct. When we remove a leaf, we don't just create all possible subtrees with one leaf removed. Instead, we pick the one subtree that, when we remove one leaf, results in the smallest increase in the sum of squared residuals.

    • @stedev2256
      @stedev2256 Před 2 lety

      @@statquest
      Josh, OK, that makes sense -- so this is repeated on each new subtree to produce the set of trees where the candidate alpha values are then formulated as at minute 13:03, correct?
      If so, are my subsequent questions still standing?

    • @statquest
      @statquest  Před 2 lety

      @@stedev2256 Each time we do cross validation we get a new "full sized tree", which may have a different size than the original. We then use the pre-determined alpha values to prune that new tree and use the test dataset to find out which tree (and alpha value) is best for that iteration.
      As for your second question, this is where the "testing" data comes in handy. A full sized tree with the most leaves (and alpha=0) will probably overfit the training data, and thus, do a pretty bad job predicting the testing data. So in practice, the full size tree (with alpha = 0) performs great with the training data (low SSR) but poorly on testing data (high SSR).

    • @stedev2256
      @stedev2256 Před 2 lety

      @@statquest
      Josh,
      thanks, I was actually rephrasing / correcting my last post, and clarified a number of things to myself while doing that... I didn't think you could see my second post while I was editing it... sorry.
      But it was not all in vain, as what you wrote last confirms what I was getting to while revising my question and in light of your previous answer, and things seem clear now.
      Thanks much

    • @statquest
      @statquest  Před 2 lety +1

      @@stedev2256 bam! :)

  • @mcapro
    @mcapro Před 4 lety +1

    You are the best.

    • @statquest
      @statquest  Před 4 lety

      Thank you, and thank you for supporting me! :)

  • @Serenity_Whisper_Music
    @Serenity_Whisper_Music Před rokem +1

    Thanks so much for your video.
    I've watched 8-times and have still one question.
    It is about 13:17~14:08
    Could you possibly explain more details about the sentence
    13:17 “Use the alpha values we found before to build trees(full and sub) that minimize the tree score”?
    My questions about this sentence are
    1. How can we use alpha in building trees process?
    - I thought the way we build trees(full and sub) is the same as how we did in your video ‘regression tree’
    *I understand tree score, alpha, and how alpha plays role in changing the tree scores of different size trees(full, sub trees)
    my question about the role of alpha(from whole data) in creating trees..
    2. If we can build trees with the same way as we did in ‘regression tree video’, why do we need this process of 13:17~14:08?

    • @statquest
      @statquest  Před rokem

      The idea is that when alpha is 0, we build a full tree just like in the original regression tree video. Then we increase alpha to the first value and that causes us to prune that tree a little bit. Then we increase alpha some more and then we prune the tree some more, until we've used all of the values from alpha we identified earlier.

  • @dingusagar
    @dingusagar Před 4 měsíci

    Thanks a lot for this. I came here after getting confused reading this concept from a book. I am inspired by your teaching style. Your style of teaching by examples is the best way to transfer knowledge without losing the audience at any point.
    May I ask how much time do you spend to create a tutorial like this? Also what kind of tools do you use to make these videos.

    • @statquest
      @statquest  Před 4 měsíci +1

      Each video takes a long time - maybe 6 weeks or more. And I talk about how I do everything in this video: czcams.com/video/crLXJG-EAhk/video.html

  • @dhananjaysawai5087
    @dhananjaysawai5087 Před 4 lety +1

    Best Explanation Dam !!!

  • @kerimbasbug
    @kerimbasbug Před 2 lety +1

    Perfect!

  • @divyagupta432
    @divyagupta432 Před 4 lety

    Please publish a session on reduced error pruning also.

  • @tamaskiss3237
    @tamaskiss3237 Před rokem +1

    Thank you for these videos Josh, I really love learning from them. Just one question, when we do the cross validation, should not the alphas be different compared to those in the full sized training data and also on the different cross validation set? If yes, how should we decide which alpha should get the most vote as they are basically different on every training data?

    • @statquest
      @statquest  Před rokem +1

      To be honest, I don't know how it is implemented in practice, but I would guess that the alphas are in comparable ranges.

  • @soniasu2744
    @soniasu2744 Před rokem +1

    you are literally doing god's work

  • @skumarr53
    @skumarr53 Před 4 lety +1

    Just a random thought, what if we prune the tree directly based on ssr computed on validation set instead of adding penalty. Anyway the tree that works well on validation set is selected. Why are adding penalty?. Does it helps controlling fluctuations in the number of leafs selected across cross validation folds during Hypertuning .

  • @Monia77777
    @Monia77777 Před 4 lety +2

    Friends reference - cherry on top! :)

  • @ahmedrejeb8575
    @ahmedrejeb8575 Před 2 lety +1

    this guy is a living legend ❤

  • @ishikajohari1508
    @ishikajohari1508 Před 2 lety +1

    You got me at Smelly Stat!

  • @andersk
    @andersk Před 2 lety +1

    At 12:20, let's say the right-hand side of the tree had node instead of just a leaf, and that node led to two leaves. In this situation, what would be pruned first to created the pruned comparison tree: prune the two leaves at the bottom of the left side only first because it's deeper? or the two leaves at the bottom of the right only first? or prune all ends with two leaves at the same time?

    • @statquest
      @statquest  Před 2 lety

      We always remove the leaves that result in the smallest increase in SSR.

  • @jcatlantis
    @jcatlantis Před 4 lety +2

    How new trees are build by imposing previous alpha values? Maybe it is not possible to find new continuously smaller trees of reduced Tree Scores for fixed alphas. Before alpha was a parameter and during cross validation is a constraint :(

  • @hellochii1675
    @hellochii1675 Před 4 lety +2

    Happy early Thanksgiving 🦃 💥 💥 💥

    • @statquest
      @statquest  Před 4 lety

      Thank you! I'm sooooo excited about the holiday. :)

  • @bobbyfischer1672
    @bobbyfischer1672 Před 3 lety +3

    At 12:05 you said that full tree has lowest tree score when alpha = 0.
    By default, pruned trees have higher SSR. Won't increasing alpha increase their tree scores even further, instead of lowering them?

    • @statquest
      @statquest  Před 3 lety +2

      Increasing alpha increases the scores of all of the trees (it never lowers the scores). However, remember, we multiply alpha by the number of leaves in a tree. So a large tree will get a much larger penalty (alpha * number of leaves) than a smaller tree.

    • @bobbyfischer1672
      @bobbyfischer1672 Před 3 lety +4

      @@statquest Actually I just figured out earlier that we have to increase the alpha on the previous tree structure to see the comparison. I thought we only need to increase the alpha only on pruned trees (to make it smaller).
      Thanks for the reply.

    • @graceqin5024
      @graceqin5024 Před 2 lety +1

      @@statquest Hi Josh, I am a beginner in learning data analytics. First I have to say our professor uses your videos in class and they really helped in understanding the course materials! As for the problem mentioned by Bobby, I also had problem with it until I saw your interaction here. I was stopped by "increase alpha until pruning leaves will give us a lower tree score."My problem was the same as Bobby. I thought increasing alpha only increase the tree scores and where should I start pruning leaves? The alpha and T get fuzzy in my mind. However, when i think "increasing alpha only increases the tree scores, i was comparing with just the full tree. Therefore the increasing of tree scores seems to have no stop. While actually, the stop is the SSR of the next tree where the leaves got pruned. At that point, the alpha is the knot which will be assigned to the next tree. (I hope I am right. Please correct if not.) Having this in mind, when I look at your words "increase alpha until pruning leaves will give us a lower tree score.", I understand it completely and I think it's very accurate. I would recommend to have some explanation here or make it more obvious in arranging of the graphical illustrations in the video. very lengthy comments. I was trying to organize my thoughts. Thank you again for wonderful videos!

    • @statquest
      @statquest  Před 2 lety

      @@graceqin5024 I'm glad you were able to understand it! BAM! :)

  • @varaddingankar3794
    @varaddingankar3794 Před 4 lety +1

    @Josh Starmer Will you please consider creating a quest for implementing the above explanation in R?
    Will be really helpful !!!
    P.S Great Quest :)

  • @rappa753
    @rappa753 Před 3 lety

    Thanks for the great video. One question though: Why is the full-sized tree build from all data (see 11:25) and not just the testing data? Couldn't this potentially give problems w.r.t. leakage?

    • @statquest
      @statquest  Před 3 lety

      You can always create a validation dataset and hold onto that until the end.

    • @rappa753
      @rappa753 Před 3 lety

      I thought as much but I was a bit astonished that it was emphasized to use all data. Thanks for the clarification.

  • @bibiworm
    @bibiworm Před 3 lety

    A quick question please. In this example, the tree is not balanced, in the sense that right subtree is a lot deeper. What if the left subtree is as deep as the right subtree, then how do we choose which side of the internal node to collapse, or in other words, which side of the leaf nodes to delete? Based on what is shown at 12:33, we should go with whichever that gives us lower tree score, right?

  • @jcatlantis
    @jcatlantis Před 4 lety

    What about order of removing leaves? If we have a huge tree, do we need to generate all possible combinations of subtrees and alpha values?

    • @statquest
      @statquest  Před 4 lety

      If you know what the sum of the squared errors are for each node in the tree, you can systematically remove leaves/splits for a specific value for alpha to work without having to generate all possible trees

  • @bibiworm
    @bibiworm Před 3 lety +2

    At 12:33, it reads on the video that "we increase alpha again until pruning leaves will give us a lower tree score". My question is lower than what? My understanding is lower than tree score of the full sized tree at that specific alpha value, in this example, at alpha=10000, because we have already established the tree score of the full sized tree at alpha = 0 has the smallest tree score. Similarly at 12:43, the third tree at alpha = 15,000 is chosen because it has lower tree score than the second tree at alpha = 15000. Please let me know if this is correct. Thanks.

  • @trupologhelper7020
    @trupologhelper7020 Před rokem +1

    Hey, Josh! Is it ok that we are using train+test to find alpha values? I mean that we are peeping into the future. do we know good thresholds using a test sample(not only train), or am I wrong? Thank you

    • @statquest
      @statquest  Před rokem

      You can set aside a set of data for final validation.

  • @r_793
    @r_793 Před 2 lety

    Regarding choosing α (starting at 11:19) when we fit a new regression to the FULL data, does this not cause us to 'overfit' α to some extent? I was wondering what would happen if we did the following:
    i) Split the data k different ways into a set that we find α for and a set that we ignore.
    ii) On the set that we find alpha for, we get [α11, α12, α13] (the first set of α's such that we get better tree scores cutting the tree by 1, 2, and 3 levels respectively.) up to [αk1, αk2, αk3].
    iii) We then take the average α for each cut so [(α11+α21+...+αk1) / k, (α12+α22+...+αk2) / k, (α13+α23+...+αk3) / k] as our set of final α's.
    iv) Perform K-Fold Cross Validation using the above to see what α gives the lowest SSR for it's optimal tree.
    Would my method make little to no difference? Or is my method overfitting more in some sense? Let me know what you think!

  • @mirroring_2035
    @mirroring_2035 Před rokem

    So a question. We learned previously that cross validation is used to test the model on different "blocks" of the test set. But in this case you are advocating for the cross validation to be used for hyper parameter tuning. Does that mean the test sets remain constant?

    • @statquest
      @statquest  Před rokem

      A lot of people ask about this and I could have probably done a much better job wording things. The way I see it, is that we have all of the data and we can split that in to "all the data we want for training" and "all the data we want for testing". We then build a tree and prune etc. using all the data we want for training and test against the testing data.

  • @beshosamir8978
    @beshosamir8978 Před rokem +1

    Hi Josh , I hope u answer my question, I was searching for 3 days till now and i got nothing
    I have 2 problem which is :
    1_ How to determine alpha where there is more one leaf in the bottom of tree (i.e : u said increase alpha till pruning this leaf get lower score) , so if i have more than one leaf in the last level of tree, which one should i cut or should i look for all subtrees every time increasing the alpha it seems like it will get high complexity?
    2_ in implementation when i will give the model the ideal alpha to implement the decision tree, how the model will know when building it in every step he take is that will lead to the subtree related to this alpha
    finally , u r such amazing i really enjoyed every lesson i took from this channel

    • @statquest
      @statquest  Před rokem +1

      1) You remove the leaf that results in the smallest increase in SSR.
      2) You build the full tree, and prune just like we did before.

    • @beshosamir8978
      @beshosamir8978 Před rokem

      @@statquest
      Thank u josh ,I really appreciate ur time for answering my questions

  • @assafv1
    @assafv1 Před 3 lety

    Question. Is it always the case that if for some value of alpha say alpha_0 a tree trained on all the training data has some number of terminal nodes say N. Then for the same value of alpha a tree trained on only some of the training data (some of the folds from cross validation) will also have the same number of nodes N?

    • @statquest
      @statquest  Před 3 lety

      No. This is why we use cross validation and repeat the process to get a sense of what the average is.

  • @mikestev8539
    @mikestev8539 Před 4 lety

    Good explanation in general, especially that this topic is difficult. But can you suggest where I could learn more about making post-pruning decision trees in R.

    • @statquest
      @statquest  Před 4 lety

      Unfortunately I only have a video shows these steps in Python: czcams.com/video/q90UDEgYqeI/video.html

  • @user-fi2vi9lo2c
    @user-fi2vi9lo2c Před 9 měsíci

    Dear Josh, thanks a lot for this video! It's awesome! You told us how to prune regression trees and your explanation was very clear. I've got a question, how can I prune classification trees? What is the biggest difference between regression trees and classification trees when pruning? I guess that the Tree Score is calculated in a different way when we prune classification trees. Can we simply add tree complexity penalty (alpha*number of leaves) to Gini Impurity to get a Tree Score in case of classification problem?

    • @statquest
      @statquest  Před 9 měsíci

      You pretty much add alpha to the total gini impurity for the tree. See: scikit-learn.org/stable/auto_examples/tree/plot_cost_complexity_pruning.html

  • @mohamedhanifansari9224
    @mohamedhanifansari9224 Před 4 lety +1

    Thank Josh, for your lucid explanation. We are longing for the XGBoost Videos. Any updates on that ?

    • @statquest
      @statquest  Před 4 lety +2

      This was actually the very first XGBoost video. XGBoost uses unique trees and to understand why, you have to know everything about normal regression trees. This information was originally in a XGBoost StatQuest, but it was too much, so I made it a stand alone video. That being said, the next video I put out, in the next two weeks or so, will be on XGBoost, then the next and the next, etc. XGBoost is a huge algorithm with a lot of parts. I've got 3 videos worth of material so far and I've only scratched the surface. I'm expecting to have at least 4 XGBoost videos, maybe more.

    • @mohamedhanifansari9224
      @mohamedhanifansari9224 Před 4 lety +2

      @@statquest Thanks so much for the reply. I've seen all your tree based videos. For the last two days, I've been reading about XGBoost all over the internet and it was quiet difficult to grasp the whole picture of XGB. I genuinely thought it would be lot more easier and intuitive if it had been explained by you. Appreciate all your work. Couldn't be more excited for the future XGBoost videos. BAM!

    • @statquest
      @statquest  Před 4 lety +3

      @@mohamedhanifansari9224 Just a few more weeks to wait! (and I'm just as excited as you are about this XGBoost thing - it's become an obsession!)

    • @mohamedhanifansari9224
      @mohamedhanifansari9224 Před 4 lety +1

      @@statquest Thanks so much! :')

  • @Theviswanath57
    @Theviswanath57 Před 3 lety

    In "Use cross validation to compare alphas" section I think it's better to compute cross validation metrics for each alpha and then decides best alpha basis cross validation metrics;

    • @statquest
      @statquest  Před 3 lety +1

      Yes, that's what "use cross validation to compare alphas" means.

  • @lolikpof
    @lolikpof Před rokem +1

    Cost Complexity Pruning is otherwise known as post-pruning, while limiting the tree depth, enforcing a minimum amount of samples per leaf/split, a minimum impurity decrease is known as pre-pruning, correct? My question is, can you apply pre-pruning first, and then apply post-pruning to the pre-pruned tree? If yes, then i assume that the alpha parameters will be found from the pre-pruned tree, right? Not the initial full tree? And then cross validation will also be performed with the pre-pruned tree, not the initial full tree, to determine the final optimal alpha score?
    And on a separate note, should only those alpha obtained from the tree trained on all the data be used when cross validating, and why? Is there no chance that some other, random alpha, might result in better performance? Considering that cross validation is done on several different test/train splits, and there will be those that do better with one alpha, and those that do better with other alphas, doesn't it make sense to try all possible alphas (from 0 to infinity) in the cross validation, not only those that give the best tree scores for the full tree? Isn't there a chance that some other alpha will give, on average, a lower sum of squared residuals than those obtained from the full tree?

    • @statquest
      @statquest  Před rokem

      I believe you are correct about how pre-pruned trees are used. And this is just how the algorithm is spelled out - possibly to keep the running time down.

  • @yasserothman4023
    @yasserothman4023 Před 2 lety

    @3:35 when you remove the leaves how do you change the decision rule of the parent node ?
    @11:55 when you build the tree from the full data set how do you get the SSR ? i mean what is the input to get it ?

    • @statquest
      @statquest  Před 2 lety

      1) The parent node reverts to the decision it made before the branch was added. In this case that means the average of the drug effectiveness for all values with dosage >= 14.5
      2) Calculating the SSR is described in the video that explains Regression Trees here: czcams.com/video/g9c66TUylZ4/video.html

  • @sdsachin24
    @sdsachin24 Před 2 lety

    @StatQuest with Josh Starmer, I have purchased your book but I didn't find these concepts (pruning, random forest, adaboost, gradient boosting) in that. Is there a way to access these presentation slides?

    • @statquest
      @statquest  Před 2 lety

      Those will be in a future book.

  • @xinranwen4849
    @xinranwen4849 Před 4 lety +2

    This video is awesome! Why didn't I know StatQuest earlier, it really helps, THX!!!
    btw: I'm confused about the built of trees in 13:19, how could we know that it is the bottom 2 leaves should be cut for a new training set when α=10000(pre-calculated), is that just a coincidence? Or the cut depends on which leaves give the lowest Tree Score?

    • @xinranwen4849
      @xinranwen4849 Před 4 lety

      My understanding of the pruning process is this:
      ①use the whole data set to build a full-sized tree, and then increase alpha from 0 to get different sub-trees that are pruned and have lower Tree Score corresponding to different alpha
      ②build a new training set, testing set from whole data, then build a full-sized tree, using the alpha we have before to build sub-trees, then calculating SSR on testing data
      ③repeat ② til we done 10-fold cross validation
      ④for each iteration, choose the alpha that has the lowest SSR
      ⑤calculate the average among all alpha in ④ to get its final value
      Am I get it right?

    • @statquest
      @statquest  Před 4 lety

      You're correct about everything except 5. We don't calculate the average of the alphas, we calculate the average of the sum of the squared residuals for each level of alpha, and select the level of alpha that corresponds to the lowest average sum of the squared residuals.

    • @xinranwen4849
      @xinranwen4849 Před 4 lety +2

      @@statquest Thx so much for your reply and correction! I think I might get the process and concept a little bit, but still need more time to fully understand it, and your videos are just the most helpful, vividly, clearly! Thx again for all the efforts and sharing!
      I'm moving to SVM right now hahaha

    • @xinranwen4849
      @xinranwen4849 Před 4 lety

      @@statquest and BAAAM!

    • @_curiosity...8731
      @_curiosity...8731 Před 3 lety

      @@statquest I am also having same doubt, can you please answer the original question? "how could we know that it is the bottom 2 leaves should be cut for a new training set when α=10000(pre-calculated), is that just a coincidence? Or the cut depends on which leaves give the lowest Tree Score?"

  • @ankurmazumder5590
    @ankurmazumder5590 Před 3 lety

    when varying alpha and checking for which alpha pruned tree performes better, how to find which pair of leaves to prune considering more than one pair of pruned tree?

    • @statquest
      @statquest  Před 3 lety +1

      Each pair of leaves account for a specific amount of the overall sum of the squared residuals. When you increase alpha, you want to remove the pair of leaves that account for the least amount of the sum of the squared residuals.

  • @amnont8724
    @amnont8724 Před rokem +1

    Hey Josh, how can we choose our alpha wisely? So that the tree with the minimum tree score will really work well for testing data too. Is there a specific rule of thumb?

    • @statquest
      @statquest  Před rokem

      I give a practical tutorial on building trees with real data here: czcams.com/video/q90UDEgYqeI/video.html

  • @a_sun5941
    @a_sun5941 Před 4 lety

    what should be the range of alpha to choose from during cross validation,
    5000 to 20,000 ? 5000 each increment ?
    or it depends on the SSR value for the full tree
    we customize alpha range according to the SSR value of full tree?

    • @statquest
      @statquest  Před 4 lety

      It depends on the tree. When you build a tree with sklearn in python, the function can return the possible values for alpha so you don't have to solve for them.

  • @juliankoch5704
    @juliankoch5704 Před rokem +1

    i'm curious as to how we find sensible alphas. there seems to be an explanatory gap here since in the section "comparing pruned trees with alpha" it says (in the NOTE at 8:23) that we find it during cross validation, but in the cross validation section at 13:18 we are supposed to "use the a values we found before". sensible alpha values would probably vary widely depending on the SSRs of the trees (and ultimately the variable ranges) and even more so once we do classification since gini and entropy give small values for which alphas of 10k would not work usefully. surely simply guessing various alpha levels and finding which of the guessed ones work best in cross validation is not the best method or am i misunderstanding something here?

    • @statquest
      @statquest  Před rokem

      We use all of the data to find candidate values for alpha (this is demonstrated at 11:18). Once we have candidate values, we test each one with cross validation to find the optimal value for alpha.

  • @sushilchauhan2586
    @sushilchauhan2586 Před 4 lety +1

    black kitty
    white kitty
    STATQUEST is best!

  • @corrinechou5271
    @corrinechou5271 Před 2 lety +1

    Well explained, thank you so much

  • @thepresistence5935
    @thepresistence5935 Před 2 lety +1

    Teachers all over the world, must learn from josh bro!