Using Bootstrapping to Calculate p-values!!!

Sdílet
Vložit
  • čas přidán 6. 07. 2024
  • Bootstrapping give us an easy way to calculate p-values for just about anything - no fancy math required! In this StatQuest, we walk through the process of calculating the p-value for a mean, and then a median, step-by-step. DOUBLE BAM!
    NOTE: This StatQuest assumes that you are already familiar with the main ideas behind bootstrapping: • Bootstrapping Main Ide...
    Hypothesis Testing and the Null Hypothesis: • Hypothesis Testing and...
    How to interpret p-values: • p-values: What they ar...
    And the basics of how p-values are calculated: • How to calculate p-values
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying my book, The StatQuest Illustrated Guide to Machine Learning:
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    CZcams Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    2:04 Calculating the p-value for a mean with bootstrapping
    6:03 Calculating the p-value for a median with bootstrapping
    #StatQuest #Statistics #Bootstrapping

Komentáře • 227

  • @statquest
    @statquest  Před 2 lety

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @maxyen9892
    @maxyen9892 Před 6 dny +1

    I really appreciate how you start off with a simple application example and then you build up from there with explanations and real time drawings. Lots of times when I read about concepts, they start more abstract or from theory, and that makes it less intuitive.

    • @statquest
      @statquest  Před 6 dny

      Thank you! I'm glad you appreciate my style.

  • @insertacoin738
    @insertacoin738 Před 3 lety +69

    I have really no words to express how incredibly amazing, clear and enlightening your videos are, you transform the historically "hard and complex" concepts into kids' games, it is astoundingly magnificent, almost majestic. Thank you, really, from the bottom of my heart. You deserve a whole university named after you.

    • @statquest
      @statquest  Před 3 lety +1

      Thank you so much 😀

    • @insertacoin738
      @insertacoin738 Před 3 lety

      @@statquest I just came with this Josh, do you have any explanation for why does this happen? stats.stackexchange.com/questions/535343/bootstrapped-mean-always-almost-identical-to-sample-mean

    • @statquest
      @statquest  Před 3 lety +3

      @@insertacoin738 Yes, I do. First, keep in mind that the person who posted that is calculating the mean of the the bootstrapped means. And this mean of means is very similar to the original mean. In other words, the mean of the histogram that bootstrapping created is centered on the mean of the original data. That's to be expected. Bootstrapping works because the sample distribution is an estimate (not an exact copy) of the population distribution. This estimate gets better as the sample size increases.

  • @KoLMiW
    @KoLMiW Před 2 lety +28

    I started watching these videos to prepare for my Introduction to Machine Learning exam but now I just watch them because it's fun to learn about it when it is so well explained. Thank you for your effort!

  • @rattaponinsawangwong5482
    @rattaponinsawangwong5482 Před 3 lety +12

    The way you explain bootstrap is so good. Make it simpler for everyone.

  • @OwenMcKinley
    @OwenMcKinley Před 3 lety +16

    Thank you! I've never really realized the power of bootstrapping until watching your 'Quests. Great stuff 👍👍

  • @summerai8724
    @summerai8724 Před rokem +1

    Thanks a lot on the explanation. I was confused on how to create a simulated distribution for calculating p-value and this video explains really well. Shifting the data to a mean of zero before resampling is the key!

  • @mattsmith4027
    @mattsmith4027 Před 6 měsíci +2

    Literal black magic.
    Cheers so much for making this I had some data that was a pain in the butt to get and Im trying to pull all I can out of it, this really helped!

  • @kanikabagree1084
    @kanikabagree1084 Před 3 lety +2

    This channel deserves atleast a million subscribers!

  • @zivot6822
    @zivot6822 Před rokem +3

    You just saved my work report, keep it up man.

  • @moali001
    @moali001 Před 3 lety +2

    Damn, that's some good quality here ! hope to see more videos !

  • @XoCortanaXo
    @XoCortanaXo Před 9 měsíci +1

    This is exactly what I was looking for, thank you!

  • @bLuemaNMKO
    @bLuemaNMKO Před 3 lety +3

    your work is amazing

  • @goonerrn
    @goonerrn Před 3 lety +1

    Josh just made the part2 so he could sing "part 2... calculate p value" this is gem!

  • @usmanazhar7073
    @usmanazhar7073 Před 3 lety +1

    Really informative, thank you so much for uploading

  • @user-gp3ts9ib6f
    @user-gp3ts9ib6f Před měsícem +1

    Thank u for your knowledge sharing. This video is helpful for me.

  • @user-tn1nw8th6g
    @user-tn1nw8th6g Před rokem +1

    It's very easy to understand! Super explanation
    Thank you

  • @saeidsas2113
    @saeidsas2113 Před měsícem +1

    I finally did it for my real problem case.

  • @KirillBezzubkine
    @KirillBezzubkine Před 3 lety +1

    Good bless you mister

  • @alikoushki6483
    @alikoushki6483 Před rokem +1

    Great video, thanks

  • @mikelmenaba
    @mikelmenaba Před rokem +1

    Great video mate

  • @rayman2704
    @rayman2704 Před 2 lety +1

    Thank you soooooooooooooo much!

  • @tejasbhagwat877
    @tejasbhagwat877 Před 3 lety +1

    Hi Josh, Big fan of your videos (and merchandise)! They are incredibly helpful :)
    Could you please also do a series on running models in Bayesian framework?

    • @statquest
      @statquest  Před 2 lety

      Yes, that's a plan.

    • @tejasbhagwat877
      @tejasbhagwat877 Před 2 lety +1

      @@statquest That would be a TRIPLE BAM! Looking forward :)

  • @l.josephineandresen610
    @l.josephineandresen610 Před 2 lety +1

    Thanks so much! These videos are really great. I was wondering if you will make one on Mixed ANOVAs? :-) Your explanations really help to understand the concepts quickly.

  • @PastryDonut
    @PastryDonut Před rokem +2

    I'm just following your Fundamentals playlist in order. My first encounter with statistics ever. Thank you so much for putting it together!! Can you recommend any collection of beginner stat problems to practice on? It would help to learn tremendously.

    • @PastryDonut
      @PastryDonut Před rokem

      Also thank you for stripping away most of the terminology! Can't imagine learning this from a regular lecture or a texbook ugh

    • @statquest
      @statquest  Před rokem +1

      I'm glad you are enjoying the video. I have a few "beginner" stat problems here statquest.org/video-index/ (just search for "StatTest")

    • @PastryDonut
      @PastryDonut Před rokem +1

      @@statquest Awesomeness, thank you!

  • @shirleygui6533
    @shirleygui6533 Před rokem +1

    awesome!

  • @marcoventura9451
    @marcoventura9451 Před 3 lety +1

    I wish I had more time for your videos. Non only they are high standard pieces of higher education but also a moment to relax and to enjoy the day.

  • @ismailalkhalaf6061
    @ismailalkhalaf6061 Před 8 dny +1

    great Video!! thank you so much. 🌻🌻
    would you please make some other videos about Wild Bootstrapping?

  • @alexvass
    @alexvass Před rokem +1

    Thanks

    • @statquest
      @statquest  Před rokem

      BAM! Thank you so much for contributing to StatQuest!!!

  • @petercourt
    @petercourt Před 3 lety +7

    Awesome video Josh! Really well explained, as usual. I was curious as to how the data is shifted (e.g. what function is applied) so that you can get from your original mean, to a mean of zero. Otherwise I think I understood everything!

    • @statquest
      @statquest  Před 3 lety +6

      BAM! :) We just subtract the original mean value from all of the original values to shift the data.

    • @petercourt
      @petercourt Před 3 lety +1

      @@statquest Haha, I should've thought of that! Thanks Josh!

  • @finanzassainz4013
    @finanzassainz4013 Před rokem +1

    OMG, looked like too complicated learn about this topics, however, you make so easy

  • @FedericoMerlo-tx2uq
    @FedericoMerlo-tx2uq Před 9 dny

    Great video, as always. I admire so much your work, your knowledge and your ability to make concept understandable.
    What about if our interest is compare some statistics between two different groups?
    For example, the mean difference between two groups:
    - Calcolate the difference of two group means
    - Bootstrapping each group by itself
    - Calculate the bootstrap mean difference and subtract the observed mean difference
    - Repeat to obtain the bootstrap mean difference under null hypotesis of no mean difference?
    Could make sense?
    Thank you so much

    • @statquest
      @statquest  Před 8 dny

      Here's a discussion on how to use the bootstrap to compare two means: stats.stackexchange.com/questions/92542/how-to-perform-a-bootstrap-test-to-compare-the-means-of-two-samples

  • @thbdf3879
    @thbdf3879 Před 2 lety +1

    I wish I could see this video earlier before my exam

  • @jamesstrickland833
    @jamesstrickland833 Před 2 lety +2

    Must we always consider both tails when calculating a pvalue from bootstrapping? Had we looked at the medians and only considered the right tail that would have been significant (@.05) to reject Ho. Or did we assume that Ha was not equal to zero and therefore a two tail test?

    • @statquest
      @statquest  Před 2 lety +3

      You don't always need to use two-tailed p-values. However, I think it is almost always a mistake to not use two-tailed p-values. Not once in my career as a biostatistician did I use a single sided test. If you want to know why, see: czcams.com/video/JQc3yx0-Q9E/video.html

  • @PeteHwang
    @PeteHwang Před 2 měsíci

    Hi Josh, thank you for the great video. I had a question at 4:57. Why do you look at the probabilities of observing means ≤ or ≥ ±0.5 in the bootstrap distribution?

    • @statquest
      @statquest  Před 2 měsíci

      Are already familiar with p-values? If not, check out these two videos: czcams.com/video/vemZtEM63GY/video.html and czcams.com/video/JQc3yx0-Q9E/video.html I believe those will answer your question.

  • @rissalhedna5534
    @rissalhedna5534 Před 5 měsíci +1

    Amazing video as usual! I just was wondering why the value of 0.05 was used as a threshold for the p-value. Was it just arbitrarily set or did we assume that it was meaningful for our experiment with the drug ?

    • @statquest
      @statquest  Před 5 měsíci

      I explain p-value thresholds here: czcams.com/video/vemZtEM63GY/video.html

  • @lbb2rfarangkiinok
    @lbb2rfarangkiinok Před 2 lety +1

    the jingles are off the chain

  • @EdoardoMarcora
    @EdoardoMarcora Před 3 lety +1

    Wouldn't shifting the bootstrap distribution that was obtained from the original sample data be basically equivalent (for the purpose of calculating a pvalue) to the bootstrap null distribution?

  • @thegimel
    @thegimel Před 3 lety +1

    It sounds like calculating p-values from bootstrapping can lend itself to p-hacking, if you find "the right" statistic that does lead to rejecting the null hypothesis because of some reason (e.g. being more or less sensitive to outliers). What do you think?

    • @statquest
      @statquest  Před 3 lety +1

      That's why for everything in statistics, you plan what you are going to do (what metric you are going to use etc.) before collecting data.

  • @ModernTolkien143
    @ModernTolkien143 Před 2 lety

    Hey Josh, thanks for this awesome video!!
    Do you know of any reference (paper, handbook chapter etc.) that shows the asymptotic validity of the approach you are using?
    Best, Sebastian

    • @statquest
      @statquest  Před 2 lety

      Here's a great place to start if you want to learn more details: en.wikipedia.org/wiki/Bootstrapping_(statistics)

  • @DeepROde
    @DeepROde Před 2 lety

    Hey, your videos are a treasure! I had a doubt, at 6:18, the histogram of median doesn't look bell-shaped. This made me wonder whether the distribution of medians would be Normal (like distribution of means) or not, could you please let us know?

    • @statquest
      @statquest  Před 2 lety +1

      The distribution of medians is not normally normal.

  • @LittleLightCZ
    @LittleLightCZ Před rokem

    The main question is, how is it different from simply running a t.test to see if the mean equals to 0 or not? Is there anything that bootstrapping adds to it? Originally I thought that bootstrapping might help for example to get tighter confidence intervals without the need to take more sample data in the field, but according to my tests which I made with boot library, the confidence intervals from the bootstrapped data are basically the same as the ones computed from the original data. Well, when I call boot.ci() they tend to be a little bit tighter, but I think it's because the t.test computation is probably a little more conservative (I guess).

    • @statquest
      @statquest  Před rokem +1

      The purpose of bootstrapping isn't to replace a t-test, or any other known statistical test. Those known tests will always perform better because they make assumptions about the data that bootstrapping does not, and that results in them having an edge. However, the magic with bootstrapping is that it can be used to calculate p-values or confidence intervals in any situation - including those that are not appropriate for t-tests or any other known test. For example, with bootstrapping we can compare medians or modes instead of means, and you can't do that with a t-test.

  • @PauloBuchsbaum
    @PauloBuchsbaum Před 11 měsíci

    Great video and I understood your procedure perfectly.
    I just believe that, in the process of shifting 0.5 to the left to redo the bootstrap taking the mean to 0, it would not be strictly necessary (except for ease of understanding)
    I think that instead of redoing the shifted bootstrap, it would be enough in the original bootstrap to take the probability of above 1.0 plus the probability below 0.0.
    In the original boostrap this would correspond respectively to get the probability above 0.5 and below -0.5, after shifting 0.5 to the left.
    Am I wrong?
    Another point is that at 4:11, the probability above 0.5 was 48%, but at 5:04 to get the p-value you used 47%.

  • @xinlu82
    @xinlu82 Před 2 lety

    Thanks a lot. Really nice video. I have a question about the number of replicates when doing the bootstrapping. Is this related to the sample size?

    • @statquest
      @statquest  Před 2 lety +1

      In a small way it is dependent on the sample size (if the sample size is small, there are only so many different bootstrapped samples you can create).

  • @PsyK0man
    @PsyK0man Před 3 lety +4

    clarification needed: to fail to reject the hypothesis that the drug has 0 effect, does it means that we don't reject the null hypothesis and this mean that the experiment is not statistically significant ? does this therefore mean that we cannot conclude whenever the drug is effective or not? or that the drug is not effective?

    • @v0ldelord
      @v0ldelord Před 3 lety +4

      It means that we do not have enough evidence to exclude that the drug has no effect. Or in other words we can't conclude that the drug is effective.

    • @statquest
      @statquest  Před 3 lety +6

      @@v0ldelord BAM! :)

    • @statquest
      @statquest  Před 3 lety +2

      To learn more about hypothesis testing, check out czcams.com/video/0oc49DyA3hU/video.html

  • @themoan
    @themoan Před 3 lety

    Hi Josh, do you have to make assumptions about normality of the data? Or does bootstrapping work for parametric and non parametric cases (because of the central limit theorem)? Thank you for another informative video!

    • @statquest
      @statquest  Před 3 lety

      Bootstrapping makes no assumptions about the data.

  • @juanete69
    @juanete69 Před 5 měsíci

    How do you use bootstrapping when you have several variable? For example for a regression model.
    How would you use it to test the standard deviation?

    • @statquest
      @statquest  Před 5 měsíci

      See: www.sciencedirect.com/science/article/abs/pii/S0167715217303450

  • @caroldanvers6306
    @caroldanvers6306 Před rokem

    Great video and helpful examples! What do you do when you're testing the median (with HO: median = 0; HA: median not 0), and the observed median is 0? As there is no shift, I'm thinking the p-value is 1.000 (as all of the bootstrapped medians are either >=0 or =

  • @Daniel88santos
    @Daniel88santos Před rokem

    Great video! Is this the working principles of "Particle Filters"/"Sequential Monte Carlo"?

    • @statquest
      @statquest  Před rokem

      I have no idea. I've never heard of those things before. :(

  • @streetsmart5033
    @streetsmart5033 Před 3 lety +1

    Sir,please explain the convolutional neural networks I'm eagerly waiting for your way of explanation

    • @statquest
      @statquest  Před 3 lety

      I've already done that, see: czcams.com/video/CqOfi41LfDw/video.html For a complete list of all of my videos, see: statquest.org/video-index/

    • @streetsmart5033
      @streetsmart5033 Před 3 lety

      @@statquest yes sir,thank you for reply but in that playlist there is no CNN and RNN.

  • @joeguerriero3841
    @joeguerriero3841 Před 4 měsíci

    but how would you do this for a test statistic (like a correlation coefficient), where creating a "null data set" from which to resample is not as straightforward as just mean-centering the data?

    • @statquest
      @statquest  Před 4 měsíci

      See: www.sciencedirect.com/science/article/abs/pii/S0167715217303450

    • @joeguerriero3841
      @joeguerriero3841 Před 4 měsíci +1

      TRIPLE BAM!!@@statquest

  • @user-on7vj1em3k
    @user-on7vj1em3k Před 2 lety

    Thank you!
    Why you calculate +-0.5 in the histogram and not only 0-0.5?

    • @statquest
      @statquest  Před 2 lety

      What time point in the video, minutes and seconds, are you asking about?

  • @alinaastakhova8412
    @alinaastakhova8412 Před měsícem

    Thank You for amazing explanation, still I am a little confused. On 4.10 of the video You have the probabilty for a mean >=0.5 as 0.48 and on 5.02 of the video the probability for a mean >= 0.5 becomes 0.47... How is that? And for the median - how do You get the probability for a median >= 1.8 as 0.01. How is that calculate once the bootstrapped distributions for medians does not go beyond ~ 0.5 units? Isn't the calculated probability simply a portion of the distribution beyond the given value (like 1.8 for the median in our example)? What do I miss?

    • @statquest
      @statquest  Před měsícem

      1) That's just a minor typo.
      2) We count the number of bootstrapped generated medians >= 1.8 and divide by the total number of bootstrapped generated medians.

    • @alinaastakhova8412
      @alinaastakhova8412 Před měsícem +1

      ​@@statquest Thanks!

  • @frashertseng9426
    @frashertseng9426 Před 3 lety

    Thank you the awesome video, 1) how does this apply to compare means from two different group (ctrl/test)? 2) What if my measure is proportion (%), how can we apply this method?

    • @statquest
      @statquest  Před 3 lety

      1) see: stats.stackexchange.com/questions/128694/bootstrap-two-sample-t-test
      2) see: online.stat.psu.edu/stat200/lesson/4/4.3/4.3.1

    • @frashertseng9426
      @frashertseng9426 Před 3 lety +1

      @@statquest Thank you Josh!!

  • @Julsten3107
    @Julsten3107 Před 3 lety

    Hey Josh, thanks for this comprehensive explanation!
    I'm a bit confused why you need to add values greater than and equal to 0.5 but also values less than and equal to -0.5 for the p-value? Why can't I just look at values >=0.5?

    • @HankGussman
      @HankGussman Před 3 lety

      It is 0.05 actually. To reject the null hypothesis, observe results must be rare. Such that the probability of observing such results is

    • @statquest
      @statquest  Před 3 lety +1

      In this video we calculate a two-sided p-value and I describe these, and the reasons for them, extensively in this StatQuest on p-values: czcams.com/video/JQc3yx0-Q9E/video.html

  • @AkashSiddabattula
    @AkashSiddabattula Před 6 měsíci

    please reply!!
    when you were calculating the p value i think we were supposed to find the p value supporting the null hypothesis and if that value is less than 0.05 we can reject the null hypothesis, but here you were calculating the p value of observing mean value of 0.5 or something more extreme and i think this is not supposed to be null hypothesis, then if we get a p value of greater than 0.05 of observing the mean >=0.5 that means often we will get mean >= 0.5 which means drug is having some effect. This is what i understood can u explain?

    • @statquest
      @statquest  Před 6 měsíci +1

      In this video, the null hypothesis is that, on average, the drug has no effect (average effect = 0). We then use bootstrapping to calculate a p-value for this null hypothesis and we get 0.63, so we fail to reject the null hypothesis that the drug has no effect. In other words, there's a high likelihood that any random set of 8 people that have the disease will have, on average, an effect = 0.5.

    • @AkashSiddabattula
      @AkashSiddabattula Před 6 měsíci +1

      Thank you so much

  • @jiangshaowen1149
    @jiangshaowen1149 Před 2 lety +1

    Hi Josh, May I know the reason why p value is calculated of two-sided?

    • @statquest
      @statquest  Před 2 lety

      Because 99 times out of a 100 you always want a two-sided p-value. For details, see: czcams.com/video/JQc3yx0-Q9E/video.html

  • @PuneetMehra
    @PuneetMehra Před 20 dny

    1:54 - Since 95% CI includes 0, we cant reject null hypothesis (drug not working). Why? What has inclusion of 0 in the CI to do with null hypothesis rejection? I am confused.
    Ps: I have studied all previous videos.

    • @statquest
      @statquest  Před 19 dny +1

      When the confidence interval contains 0, then we can't be confident that the true value is not 0, even though our estimate is not 0. In other words, there is enough variation in the data that we can't have a lot of confidence in the estimate we made with it.

  • @rupiyaldekai6136
    @rupiyaldekai6136 Před 3 lety

    can you do pytorch implementation for ann.and fuzzy systems.please sir

  • @DrMcZombie
    @DrMcZombie Před 3 lety

    Hi Josh and thank's for the overview. I have been using bootstrapping for quite some time now, but not to look at p-values for just one data set. What you describe is---more or less---a different kind of t-test, right?
    I am using bootstrapping for determining confidence intervals, but also to compare two datasets, e.g., I use two models to predict data and compare the models' performance with bootstrapping.
    For example, is the root-mean-squared prediction error (RMSE) larger in data set A in comparison to data set B?
    When repeating this (e.g.) 1000 times, each time comparing the RMSEs, I get a p-value from these comparisons.
    --> Model A performed better than model B in 990 of 1000 comparisons --> p = 0.99 (or 0.01)
    I hope this was understandable.
    What are your thoughts on this application of bootstrapping?

    • @statquest
      @statquest  Před 3 lety

      This example is like a one-sample t-test (without having to refer to the t-distribution). Your experiment is a little confusing. You have data sets A and B and also models A and B, so I don't know what you are comparing.

    • @DrMcZombie
      @DrMcZombie Před 3 lety

      ​@@statquest Thanks, and I try to explain a bit more: I have data that I measured (in my case those are Speech Recognition Thresholds, i.e., the signal to noise ration at which 50 % of spoken words can be understood in a noisy environment, I hope this is not getting to abstract). I want to simulate this data with different models and I want to determine which model is better (e.g. model A and model B).
      To figure out, which model is better, I create a bootstrapped data set of the measured data and calculate the RMSE for both model simulations. Let's say, the RMSE for the bootstrapped data set of model A is 1 and of model B it is 2. I compare these values and count how often the RMSE of model A was lower than the RMSE of model B:
      --> For this first comparison, I count 1.
      Second run: RMSE of model A is 1.5, RMSE of model B is 1.4
      --> I do not count this (1 of 2 comparisons indicate that the RMSE of model A is lower than the RMSE of model B)
      When repeating this procedure 1000 times, 990 of the comparisons showed that model A has a lower RMSE, and in 10 comparisons model B had a lower RMSE.
      I consider this to yield a p value of 0.99 (which is effectively an p value of 0.01).
      I hope you find this interesting, and I would be happy to get your thoughts on this application of bootstrapping.

    • @statquest
      @statquest  Před 3 lety +2

      @@DrMcZombie You've calculated a probability, which is part of a p-value, but not a p-value. A p-value is the probability of the observed result or data plus the probabilities of all results that are more extreme. For details, see: czcams.com/video/JQc3yx0-Q9E/video.html
      So, here's what you should do (or consider doing):
      0) The null hypothesis is that there is no difference between models A and B. This means that we would expect the difference in RMSE to be 0 between models A and B.
      1) Bootstrap your data, run it through your models and make a histogram of differences in RMSE.
      2) Draw a 95% CI between the 2.5% quantile and the 97.5% quantile of that histogram
      3) Does that CI include 0? If so, fail to reject the hypothesis that models A and B are the same. If not, reject the hypothesis that models A and B are the same. Bam.

    • @DrMcZombie
      @DrMcZombie Před 3 lety

      @StatQuest with Josh Starmer Thank you for your reply, and I also see the point that you make. But just to clarify: Wouldn't this boil down to the "counting the comparisons approach"? (not with regard to the p-value, but just for failing to reject the null hypothesis)
      When 10 of 1000 comparisons (1%) showed, that model A had a lower RMSE than model B, then the 95%-CI of the histogram of differences between the models would not include 0.
      The CI would include 0 when 25 or more of 1000 comparisons (i.e. more than 2.5 % of the comparisons) would show that model A has a lower RMSE than model B.
      Anyway, thank's and I am looking forward to more of your great videos.
      --> octave code example (e.g. use octave-online.net/):
      % let's assume A and B are the RMSEs of two models.
      % H1: A is significantly different from B (0 not in 95%-CI of the difference histogram)
      % H0: A and B are the same (0 in 95%-CI)
      A = randn(10000,1) + 3; % random numbers, mean = 3; std = 1;
      B = randn(10000,1); % same, but mean = 0;
      hist(A-B); % draw histogram
      comparisons = sum(B > A) / numel(B);
      CI = quantile(A-B,[0.025 0.975]);
      printf('comparisons: %1.3f ; CI: [%1.3f %1.3f]
      ', comparisons, CI);
      % when CI does not include 0 --> H0 rejected, H1 true

  • @joxa6119
    @joxa6119 Před 2 lety

    So what happened exactly when we shift the data (so the mean will be 0)? Any formula for the data shift?

  • @HannahMeaney
    @HannahMeaney Před rokem

    I dont understand how you got the actual p-value number? for example the p-value of 0.47 - how was that calculated?

    • @statquest
      @statquest  Před rokem

      First off, the p-value is not 0.47, so that might be part of the problem. At 3:29 we have a histogram that tells us what would happen if the null hypothesis was true. Then at 3:36 we can calculate the percentage of means that were between -0.5 and 0.5 (this is just the number of means that we calculated that fell between -0.5 and 0.5 divided by the total number of means). This percentage was 36%, which also tells us that the probability of observing a mean between -0.5 and 0.5 is 0.36. Likewise, we then calculate the probability of observing a mean = 0.5 + the probability of observing a mean

  • @DrThalesAlexandre
    @DrThalesAlexandre Před 2 měsíci

    Amazing video!
    Any ideas on how to make bootstrapping run faster on python? It starts lagging once you are doing > 10^5 trials with large sample sizes.

    • @statquest
      @statquest  Před 2 měsíci +1

      Good question...I'm not really sure, but with a large sample size, you might be able to get away with doing less bootstrapping.

    • @DrThalesAlexandre
      @DrThalesAlexandre Před 17 dny

      @@statquest Thanks! There is probably some library that does this efficiently. I was just curious about how one could be implemented, but it something that can be learned at another point in time.

  • @willw234
    @willw234 Před 2 lety

    Thanks for the very clear and informative description of this. I have a question - whenever the absolute value of the mean/median/statistic-of-interest of the original data is greater than the absolute value calculated from the shifted data, the p-value will be zero. I have a large set of tests to run and would like to do an FDR correction on the resultant set of p-values, but a not-insignificant number of them are zero. Is this still a legitimate thing to do?

    • @statquest
      @statquest  Před 2 lety

      I'm not sure I understand your problem because each time you calculate a p-value you have to calculate the bootstrapped statistic. Are you saying that when the absolute value for every single bootstrapped statistic (and there should be > 10,000 of them) is > then the original statistic, the p-value is 0? Well... if that is the case, all 10,000 bootstrapped statistics are way far away from 0, then the p-value should be 0.

    • @willw234
      @willw234 Před 2 lety

      @@statquest Sorry, I probably didn't explain very well. For the shifted data, the largest possible mean of a bootstrap resample is just the largest value in the shifted data (which happens when it is chosen for every element of a resample). When the mean of the original unshifted data is larger than this, the p-value will be zero, regardless of the number of bootstrap resamples carried out. But this does not distinguish between cases when it is just a little bit larger, or very much larger. So if I have a lot of tests on independent data sets, I am concerned that the 'zero p-vaue' ones will be treated identically by the FDR procedure, when perhaps they shouldn't be??

    • @statquest
      @statquest  Před 2 lety

      @@willw234 Since you are just testing the mean, you might consider just using a one sample t-test. Then you're p-values will be more spread out.

    • @willw234
      @willw234 Před 2 lety +1

      @@statquest I will do that. I was just hoping to use the bootstrap so I could use the median instead of the mean. (btw I recently purchased your book on ML - very helpful, thank you!)

    • @statquest
      @statquest  Před 2 lety

      @@willw234 Awesome! Thank you!

  • @zerocoll20
    @zerocoll20 Před 2 lety

    There's anyway to know how good this method is? I mean, comparing resampling with actual knew statistics?

    • @statquest
      @statquest  Před 2 lety +1

      Yes, the same theory that we use to trust "normal" statistics (like t-tests and what not) also applies to bootstrapping. In other words, the theory that allows you to put trust into a t-test also suggests we should put trust in bootstrapping.

  • @mikhaeldito
    @mikhaeldito Před rokem

    When to use permutation over bootstrap (and the other way around) to calculate P-values?

    • @statquest
      @statquest  Před rokem

      If you have a relatively small dataset, you can use permutation. If it's relatively large, then you can use bootstrap.

    • @mikhaeldito
      @mikhaeldito Před rokem +1

      @@statquest BAM!

  • @user-gj8vs1do9n
    @user-gj8vs1do9n Před rokem

    Hi Josh!
    How do we calculate critical value of statistic in this case?

    • @statquest
      @statquest  Před rokem

      If, for example, alpha = 0.05, then you can incrementally add the tails of the histogram together until you get 0.05. The last parts of this histogram added define the critical values.

    • @user-gj8vs1do9n
      @user-gj8vs1do9n Před rokem +1

      ​@@statquest Got it! Thank you!

  • @juanete69
    @juanete69 Před 2 lety

    I don't understand why you use the shifted data to perform the bootstrap. What if you don't "know" the null hypothesis but just your sample?

    • @statquest
      @statquest  Před 2 lety +1

      You don't have to shift the data, it just makes the math easier.

  • @SunSan1989
    @SunSan1989 Před 10 měsíci

    Perhaps because of the different ways of thinking between East and West, as an Asian I find it easier to understand not to switch to a mean of zero and use the drug no effect as -0.5, but to do so is somewhat inconsistent with the null hypothesis method,good tutorial.
    There is another problem, that is, the example of 0.36 probability and the probability of less than -0.5 is 0.16 and the probability of greater than 0.5 is 0.47, which seems to be a bit contradictory to do bootstrapping on the basis of the null hypothesis. If the bootsrtrapping times are enough, shouldn't the probability of less than 0.5 and greater than 0.5 be equal?

    • @statquest
      @statquest  Před 10 měsíci

      What time point, minutes and seconds, are you asking about?

    • @SunSan1989
      @SunSan1989 Před 10 měsíci

      Dear Josh,time point is 4:07, the probability of less than or equal to-5 is 0.16, greater than or equal to 5 is 0.48 in time pint4:10. Is this probability a reasonable example? If bootstrapping enough times, shouldn't 0.16 be equal to 0.48?
      In addition, why can't the paper version of the book be sent to China? I bought it in Japan and transferred it from Japan to China.@@statquest

    • @statquest
      @statquest  Před 10 měsíci

      @@SunSan1989 My guess is that they will probably meet in the middle. As for my book, there should be a Chinese version (and translation) available in the next year. People are working on it.

    • @SunSan1989
      @SunSan1989 Před 10 měsíci

      Sorry, since my English is not very good, I want to confirm that my understanding 0.16 should be replaced with the same value as 0.48. Is this understanding correct? @@statquest

    • @statquest
      @statquest  Před 10 měsíci

      @@SunSan1989 No, I'm not sure what the value will be, but the sum will probably still add up to something close to 0.63

  • @mohamedsase7250
    @mohamedsase7250 Před rokem

    Can we use bootsrap to calculate confidence interval (%) for conditional event element like cross-tab element and how? Thank you

    • @statquest
      @statquest  Před rokem

      Probably, but I don't know what a cross-tab element is so it would be better to get someone else to answer.

    • @mohamedsase7250
      @mohamedsase7250 Před rokem

      @@statquest cross-tab actually who use spss know it
      It is cross table like cross two variables such as as gender and healthy (yes or no), you will end with 4 group, i want to know if i can consider the each group as independent group and calculate CI as normal

    • @mohamedsase7250
      @mohamedsase7250 Před rokem

      Note: i have searched on the answer from months, thank you alot

  • @acc3095
    @acc3095 Před 2 lety

    Is there a minimum sample size needed for bootstrap to be valid?

    • @statquest
      @statquest  Před 2 lety

      I think 8 might be a good starting point.

  • @jeffz7310
    @jeffz7310 Před 2 lety

    where did the 0.05 come from at 5:33 ? thank you

    • @statquest
      @statquest  Před 2 lety

      0.05 is the standard threshold for hypothesis testing. For details, see: czcams.com/video/vemZtEM63GY/video.html

  • @unlearningcommunism4742

    I gave it a try today. It's still not working / returning what I want it to return.

  • @accountname1047
    @accountname1047 Před 3 lety +1

    ah the elusive triple bam

  • @bobiq
    @bobiq Před rokem

    We fail to reject the hypothesis that the drug makes no difference. - a triple negation in one sentence is what makes statistics such a mind-bending exercise. Why can't this be expressed more easily?

    • @statquest
      @statquest  Před rokem

      Good point! Yes, classical statistics lends itself to a lot of awkward wording. Bayesian statistics attempts to make the language easier - and one of the ideas in this video, using computers to generate a lot of data, is a big step towards getting there.

  • @jasd100
    @jasd100 Před 2 lety +1

    My brother thought I was watching Blue's Clues, but stats edition

  • @alputkuiyidilli
    @alputkuiyidilli Před 2 lety

    1) Make a bootstrapped Dataset
    2) Calculate a statistic
    3)???
    4) Profit.

  • @redcat7467
    @redcat7467 Před 2 lety +1

    That was a bam with different statistics.

  • @engr.majidkaleem8810
    @engr.majidkaleem8810 Před rokem

    Could you please upload 5 unavailable hidden videos?

  • @PunmasterSTP
    @PunmasterSTP Před 3 měsíci +1

    Q: What's the significance of a urine test?
    A: The p-value!

    • @statquest
      @statquest  Před 3 měsíci +1

      Ugh! ;)

    • @PunmasterSTP
      @PunmasterSTP Před 3 měsíci

      @@statquest Q: What do claims adjusters use to estimate hail damage?
      A: Confi-dents intervals.

  • @shivverma1459
    @shivverma1459 Před 2 lety

    lets says we dont see the p values and see that the 95% confidence interval is crossing 0 at 5:41 then cant we say that the majority of means are crossing 0 therefore drug has been helping in the recovery instead of having no effect. I mean, with confidence interval point of view.

    • @statquest
      @statquest  Před 2 lety +1

      This example is not great for discussing CIs because we shifted the data to be centered on 0. If we wanted to calculate a CI, we would do this: czcams.com/video/Xz0x-8-cgaQ/video.html

    • @shivverma1459
      @shivverma1459 Před 2 lety

      @@statquest ohkk thanks bam!

  • @cjh4467
    @cjh4467 Před 3 lety +3

    Why don't people just use bootstrapping for everything instead of worrying about robust standard errors and other types of similar concerns?

    • @statquest
      @statquest  Před 3 lety +2

      It's a good question. The answer, I believe, is "power". Bootstrapping works in all kinds of situations, but (I believe) it has less power than parametric methods.

    • @cjh4467
      @cjh4467 Před 3 lety +1

      @@statquest Thank you!

    • @SunSan1989
      @SunSan1989 Před 10 měsíci +2

      @@statquest That's a really good question, dear Josh, can you make a video about the differences in power? Thank you for the tutorial.I appreciate it very much.

  • @gardaramadhito1650
    @gardaramadhito1650 Před 2 lety

    Isn’t this just randomization inference and you’re testing the sharp null hypothesis?

    • @statquest
      @statquest  Před 2 lety

      I believe they are different: jasonkerwin.com/nonparibus/2017/09/25/randomization-inference-vs-bootstrapping-p-values/

  • @yongkailiu1448
    @yongkailiu1448 Před 11 měsíci

    make another video talking one-sided test?

    • @statquest
      @statquest  Před 11 měsíci

      You can just multiply the p-value by 2.

  • @yazanal-shoushie9929
    @yazanal-shoushie9929 Před 2 lety +1

    بحبك

  • @dbuezas
    @dbuezas Před rokem +1

    Can you please meet 3blue1brown?
    If you two would do something together it would surely be glorious

    • @statquest
      @statquest  Před rokem

      That would be a dream come true. I wonder what the best way would be to introduce myself.

    • @dbuezas
      @dbuezas Před rokem +1

      @@statquest does asking your crowd to spam his comment section go against youtuber's etiquette? 😁

    • @statquest
      @statquest  Před rokem +1

      @@dbuezas I bet. Maybe we can find another way. I'll do what I can.

  • @saeidsas2113
    @saeidsas2113 Před 2 měsíci

    Hi Josh, I have a question, how I can contact you and ask my question?

    • @statquest
      @statquest  Před 2 měsíci

      If you have a question about my videos, the best place to ask it is right here, in the comments.

    • @saeidsas2113
      @saeidsas2113 Před 2 měsíci

      @@statquest Yes, but I need to write a bit of narrative to clarify my question related to Bootstrap but not particularly your nice video. I am a risk analyst working at a company and also doing my PhD in the field of actuarial science. We recently encountered an issue related to a model being used at the company.

    • @statquest
      @statquest  Před 2 měsíci

      @@saeidsas2113 Unfortunately I don't have time to do much consulting work. :(

    • @saeidsas2113
      @saeidsas2113 Před 2 měsíci

      @@statquest @statquest , If you do not mind I shoot my question here :) To begin with, I am a model validator, and one of our tasks is to ensure that a model works as expected and is fit for business purposes. To do so, back-testing is typically performed to check the model performance. In a nutshell and simple language, we have the following problem:
      A financial model generates thresholds at a confidence level of 90 percent. In order to check the model performance, it is important to count the number of defects over a given period which is usually 250 working days (i.e., one year). The defect is defined as below:
      A defect occurs if the relative market movement in 10 days is greater than the threshold, in other words:
      log(P_{t+10} /P_{t}) > v_t, where i = 1, 2, ..., 240 and P_{i} is the market price at time t and v_t stands for the thresholds comes out of the model. Note that the market movements are obtained on a rolling basis so we have overlapping intervals. If we believe that the model works good, then one can expect that the number of defects observed over 240 should be 2.4 ~ 3 violations because only at the confidence level 90 percent there is 10 percent chance for observing defects, i.e., 240*0.01 = 2.4.
      Now let's consider the test hypothesis that needs to be done in order to back-test the model:
      Null hypothesis: p = 0.01
      Alternative hypothesis: p > 0.01
      where p is the probability of defect. Under the null hypothesis, the model works as expected because the probability of defect is 1% which is acceptable at the confidence level of 90 percent. Here are the steps taken to back-test the model
      1) Compute the spread which is the difference between the market movement and threshold, i.e., Spread = log(P_{t+10} /P_{t}) - v_t
      2) Generate 1000 synthetic samples each with size 240 from the original spreads while preserving the dependency structure, for example, the Maximum Entropy Bootstrap approach is applied in this stage.
      3) Count the number of positive spreads (indicating defects) for each synthetically generated sample.
      4) Obtain the defect ratio for each synthetically generated smaple using (#defects)/240.
      5) Use the distribution of the generated defect ratios (i.e., the probability of defect) to find the p-value corresponding to the above hypothesis test. So, using p*_1, p*_2, ..., p*_1000 we calculate the following probability:
      p-value = P_H0( p > 0.01 ) that is approximated basedo the distribution of p*_1, p*_2, ..., p*_1000.
      My question: Here the quantity under consideration is the probability of a defect or we could consider the defect rate. If the observed defect rate in the original data set is greater or less than 0.01, then we need to apply a transformation, like what you did for mean where you shifted the data to get zero mean, to have ratio equal to 0.01 and then generate samples from spreads for which the defect ratio is 0.01 to compute the probability of being greater than 0;01 under the alternative hypothesis right?

    • @saeidsas2113
      @saeidsas2113 Před 2 měsíci

      @@statquest It is fine howevere I already asked my question and I think it is interesting to be taken into account. Feel free to answer it. Thank you for your time.

  • @drachenschlachter6946

    How do you shift the data?

    • @statquest
      @statquest  Před rokem

      At 2:29 I say that we shift the data to the left by 0.5 units (where 0.5 is the mean of the data). That means we subtract 0.5 from each value in the dataset.

    • @drachenschlachter6946
      @drachenschlachter6946 Před rokem

      @@statquest but why Josh? If you have the bootstrap distribution and you calculate the 95% confidence interval you can say if the hypothesis can be rejected or not? If 0 is in than it can't be rejected. So why shift the data it doesn't matter?

    • @statquest
      @statquest  Před rokem

      @@drachenschlachter6946 Because this video is talking about how to calculate p-values, not confidence intervals. The first bootstrapping video describes confidence intervals (and does not require shifting the data): czcams.com/video/Xz0x-8-cgaQ/video.html

  • @ilusoeseconomicas2371

    There is no reason to subtract the mean of the distribution before bootstrapping and then adding it later. Just bootstrap the original data and see where the original mean is in the generated histogram.

    • @statquest
      @statquest  Před rokem

      I shifted the data because the null hypothesis is that the "true mean" is 0 and it's helpful to see how the distribution would be distributed around 0 in that case.

  • @chrislam1341
    @chrislam1341 Před 2 lety

    I cannot understand why do we care about the region of -0.5..
    Given a data with mean 0.5 and variance v, how likely i see this data if the mean is 0. lets assume the data is from a normal distribution, N
    p-value = P(mean >= 0.5| N(0, v))
    if p-value reject H0
    if p-value > 0.05: it is likely that the H0 is true => cannot reject H0
    where is the role of -0.5 here?

    • @statquest
      @statquest  Před 2 lety

      I almost always use two-sided p-values, and I explain the reasons here: czcams.com/video/JQc3yx0-Q9E/video.html