False Discovery Rates, FDR, clearly explained

Sdílet
Vložit
  • čas přidán 9. 01. 2017
  • One of the best ways to prevent p-hacking is to adjust p-values for multiple testing. This StatQuest explains how the Benjamini-Hochberg method corrects for multiple-testing and FDR.
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    CZcams Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    #statistics #pvalue #fdr

Komentáře • 427

  • @statquest
    @statquest  Před 2 lety +5

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @ronnieli0114
    @ronnieli0114 Před 3 lety +205

    My PhD dissertation relies heavily on bioinformatics and biostatistics, although my background is neuroscience. Naturally, I had a lot of learning to do, and your videos have helped me immensely. Every time I want to learn about a stats concept, I always type in my Google search, "[name of concept] statquest." Seriously, this is almost too good to be true, and I just wanted to thank you for providing this absolute gold mine.

    • @statquest
      @statquest  Před 3 lety +13

      Wow! Thank you very much and good luck with your dissertation.

  • @simonpirlot2720
    @simonpirlot2720 Před 4 lety +93

    You make without a doubt the best videos about statistics on CZcams: funny, clear, intuitive, visual. Thank you so much.

  • @dysnomia6413
    @dysnomia6413 Před 4 lety +19

    God bless you, I made screenshots of this video to explain this concept to my lab. This isn't the first time you've helped me with RNA-seq procedures. I have bumbled through a differential expression analysis. Trying to understand the statistical methods and knowing which option amongst several is the most logical is a mental hurdle. I am the only student in my lab currently undertaking bioinformatics and I am essentially trying to teach myself. There is a huge vacuum of knowledge in this realm amongst biologists and it's daunting. We all can generate data until we're blue in the face, but it doesn't do anyone any good until someone knows how to analyze it.

    • @statquest
      @statquest  Před 4 lety +3

      Awesome! Good luck learning Bioinformatics.

  • @meg7617
    @meg7617 Před 3 lety +13

    Can't thank you enough!! Your methods are truly amazing. Being able to deliver them to us so cleverly is a true indication of how much effort you must have put into understanding these concepts .

  • @didismit1766
    @didismit1766 Před 5 lety +31

    BAM BAM BAM, thanks a lot man...Your 20 minutes most likely saved hours of trying to understand from wikipedia...

    • @statquest
      @statquest  Před 5 lety +1

      Sweet!!! Glad I could help you out. :)

  • @Demonithese
    @Demonithese Před 7 lety +9

    Fantastic video, thank you for taking the time to put this together.

  • @wenbaoyu
    @wenbaoyu Před 3 lety +6

    Wow wow wow how intuitive and visual. Can’t thank you enough for saving me from spending hours struggling to understand this concept🙏

  • @zebasultana930
    @zebasultana930 Před 6 lety +1

    Awesome explanation !! Thanks for taking the time to make these videos and also answering questions from viewers so well. Going through them already answered some queries that I had :)

  • @user-gx3eg5sz9n
    @user-gx3eg5sz9n Před 3 lety +3

    Im from China and I watched your channel in Bilibili but I cant had enough so I catch you all the way up ended here, a paradise of data science! thank you Josh, wish you the best!

  • @fmetaller
    @fmetaller Před 4 lety +3

    I love you ❤️. I was so afraid of FDR adjustment because I thought the math behind was empirical and worked like magic but you made it surprisingly intuitive.

  • @kakusniper
    @kakusniper Před 6 lety

    I'am currently learning to do RNAseq data analysis, these videos are extremely helpful.

  • @daeheepyo3053
    @daeheepyo3053 Před 5 lety +6

    OMG!! This is the most beautiful explanation I've ever experienced...... Thank you so much professor.

  • @docotore
    @docotore Před 3 lety +3

    Simple, informative, and to the point. Absolutely perfect.

  • @chunhuigu4086
    @chunhuigu4086 Před 4 lety +5

    Great tutorial for FDR. The adjusted p-value is a p-value for the remaining result after cutting off some results you know that are not significant just by the distribution. It will be better if you can tell something about Q-value and how Q-value reflects the quality of a experiment.

  • @yanggao8840
    @yanggao8840 Před 4 lety +3

    I have always hated math and you just make it clear and interesting! Can't thank you enough

    • @statquest
      @statquest  Před 4 lety

      Hooray!!! I'm glad the video is helpful. :)

  • @loretaozolina8414
    @loretaozolina8414 Před 2 lety +2

    Thank you! This was really helpful and made me smile during my intense evening revision :)

  • @RobertWF42
    @RobertWF42 Před 6 lety +1

    Cool, thanks for posting this, very intuitive! An equivalent method for eyeballing the # of true null hypotheses is to plot ranked 1 - p-value on the x-axis and the hypothesis test rank on the y-axis, then fit a line to the scatter plot, starting at the origin. Where the line hits the y-axis is your estimate of the # of true null hypotheses.Would like to see an intuitive explanation for the Benjamini-Yekutieli procedure, used in studies where the tests are not completely independent!

  • @niklasfelix7126
    @niklasfelix7126 Před 4 lety +6

    Thanks for the awesome explanation! Really informative and easy to follow. And the DOUBLE BAM in the end actually made me laugh out loud :D

  • @broken_arrow1813
    @broken_arrow1813 Před 5 lety

    The clearest explanation of BH correction so far. Quadruple BAM!

  • @FengyuanHu
    @FengyuanHu Před 6 lety

    This is simply great!!! Thanks for sharing Joshua.

  • @shanoodi
    @shanoodi Před 7 lety

    This is the best video that explains FDR. Thank you,

  • @ryanruthart581
    @ryanruthart581 Před 6 lety

    Great video, your example was clear and very will illustrated.

  • @hossam86
    @hossam86 Před 3 lety +2

    This is amazing. Very well explained and easy to understand!

  • @AnkitDhankhar-uv6qd
    @AnkitDhankhar-uv6qd Před měsícem +1

    First and foremost, I extend my heartfelt gratitude for providing such a series that elucidates concepts in an easily comprehensible manner. Bam !☺

  • @frrraggg
    @frrraggg Před rokem +2

    As always, by far the best explanation on the web!

  • @ericshaker9377
    @ericshaker9377 Před 3 lety +1

    Wow was seriously struggling with my research since I dont know the first thing about statistics and I love this so so so much. So instructional I had to like

  • @tinAbraham_Indy
    @tinAbraham_Indy Před 2 lety +1

    Thank you very much indeed for the perfect explanations and examples of the FDR concept. I really get my answer.

  • @ramazanaitkaliyev8248
    @ramazanaitkaliyev8248 Před měsícem +1

    Great explanation, thanks ! clear explanation, amazing balance between theory and examples

  • @archanaydv995
    @archanaydv995 Před 5 lety +1

    Just wow!! Thank you for this.

  • @li-wenlilywang8856
    @li-wenlilywang8856 Před 6 lety

    Thank you so much for this great movie!! Great explanation.

  • @fgfanta
    @fgfanta Před 7 dny +1

    From the way my university teachers (didn't) explain to me Benjamini-Hochberg, and after watching this video, I can claim I now understand Benjamini-Hochberg better than them, at a 99.7% confidence level!

  • @abdullahalfarwan1458
    @abdullahalfarwan1458 Před 3 měsíci +1

    شكرا جاش. ماقصرت. مقطع مختصر ومفيد

  • @diegocosta2383
    @diegocosta2383 Před 3 lety +1

    Nice video, simple and fast.

  • @reflections86
    @reflections86 Před rokem +2

    Josh is a genius. Really appreciate your work statquest.

  • @ieserbes
    @ieserbes Před 7 měsíci +1

    As always, it is a great explanation. Thank you Josh 👏

  • @PedroRibeiro-zs5go
    @PedroRibeiro-zs5go Před 6 lety

    Dude thanks so much, this video is AWESOME!!!

  • @weihe3639
    @weihe3639 Před 6 lety

    Very nice explaination!

  • @rodrigohaasbueno8290
    @rodrigohaasbueno8290 Před 5 lety +1

    I have to keep saying that I love this channel so much

    • @statquest
      @statquest  Před 5 lety

      Hooray!!! Thank you so much!!! :)

  • @annawchin
    @annawchin Před 3 lety +1

    This was SUPER helpful, thank you!

  • @adelinemorez8072
    @adelinemorez8072 Před 11 měsíci +2

    I love you StatQuest. Thank you for never letting me down. You were always present to answer my deepest and most shameful doubts. You never abandoned me during the darkest hours of my PhD.

    • @statquest
      @statquest  Před 11 měsíci +1

      I'm so happy to hear my videos helped you. BAM! :)

  • @afraamohammad1001
    @afraamohammad1001 Před 4 lety +1

    Thanks for your effort and simplified explanation!!! live saver ))

  • @junymen223
    @junymen223 Před 7 lety

    Thanks a lot. Mr. Joshua

  • @user-dk4ss4gp3l
    @user-dk4ss4gp3l Před rokem +1

    This is my first time fully understanding FDR ...

  • @agnellopicorelli4751
    @agnellopicorelli4751 Před 3 lety +1

    I just love your videos. Thank you so much!

  • @RavindraThakkar369
    @RavindraThakkar369 Před 2 lety +1

    Nicely explained.

  • @telukirIY
    @telukirIY Před 6 lety

    Good explanation

  • @karinamatos4253
    @karinamatos4253 Před 3 lety +1

    Great explanations!

  • @maryamsediqi3625
    @maryamsediqi3625 Před 3 lety +1

    Thank you sir, was very useful 🙏

  • @timokvamme
    @timokvamme Před 3 lety +1

    Nice explaination!

  • @barbaramarqueztirado7567
    @barbaramarqueztirado7567 Před 2 lety +1

    Thank you very much por the explanation, very very clear!!

  • @kezhang1460
    @kezhang1460 Před 3 lety +2

    BAM!!!finally i understand it, which confused me half a year!!

  • @ygbr2997
    @ygbr2997 Před rokem +1

    the second half is hard to understand, but I know I will come back later and watch it again, and again, and again until I finally understand it

    • @statquest
      @statquest  Před rokem

      Let me know if you have any specific questions.

  • @yoniashar3179
    @yoniashar3179 Před 5 lety +1

    This is a great video. And, could help me understand how the intuitive understanding (the histograms of p values coming from two distributions) connects to the mathematical procedure of the B-H procedure? thank you!

  • @isaiasprestes
    @isaiasprestes Před 6 lety +83

    1 thumb down is a case of FDR :)

  • @ilveroskleri
    @ilveroskleri Před 4 lety +1

    Thanks, that was preciuos (and spared me hours of frustration)

  • @poiskkirpitcha2003
    @poiskkirpitcha2003 Před 4 lety +1

    Thank you, bro!

  • @sunjulie
    @sunjulie Před 3 lety +2

    It's so good, I want to give it more than one thumb up!

  • @noahsplayground2564
    @noahsplayground2564 Před 3 lety

    Hey Josh, love you videos on stats, specifically centered around hypothesis testing. Can you do more videos on the different techniques of hypothesis testing, like (group) sequential testing and multi-armed bandit?

  • @unavaliableavaliable
    @unavaliableavaliable Před rokem +1

    This video is so beautiful.. Thank you so much

  • @arem2218
    @arem2218 Před 3 lety +1

    Thank you, nicely expalined

  • @worldofinformation815
    @worldofinformation815 Před 3 lety +1

    Thank you Sir🌹

  • @RobertWF42
    @RobertWF42 Před 6 lety

    One part I don't quite understand is how the intuitive eyeball method translates into the B-H p-value adjustments you explain starting at ~15:00. To me, plotting a line along the H0 = True p-values sounds like you would be fitting a linear regression & identifying the outliers < .05.

  • @karolnowosad886
    @karolnowosad886 Před 3 lety +1

    I love the explanation!

  • @torquehan9404
    @torquehan9404 Před rokem +2

    I don't understand one thing. If samples are taken from the same population, p-value bins would NOT be evenly distributed, rather it is also skewed toward p=1 because it is normally distributed and most of the time samples close to average values are likely to be picked.

    • @statquest
      @statquest  Před rokem +1

      By definition, p-values are uniformly distributed. By definition, a p-value = 0.5 means that 5% of the random tests will give results equal to or more extreme. a p-value = 0.1 means 10% etc etc. etc.

    • @torquehan9404
      @torquehan9404 Před rokem +1

      Thanks a lot!

  • @rongruo2624
    @rongruo2624 Před 4 lety +3

    I'd like to know why when samples come from the same distribution, the p values are uniformly distributed? Thank you!

  • @vaibhavijoshi6443
    @vaibhavijoshi6443 Před 4 lety +1

    This is amazing. thank youu.

  • @BadalFamily
    @BadalFamily Před 4 lety +1

    Hi Josh! Great stuffs here. Could you please make a video on "Significance Analysis of Microarrays". Mainly how it differs from T-stat/Anova. Really appreciate you for all the videos.

    • @statquest
      @statquest  Před 4 lety +1

      I'll keep it in mind, but I can't promise I'll get to it soon.

  • @biancaphone
    @biancaphone Před 5 lety +1

    Would love a video about the target decoy approach

    • @statquest
      @statquest  Před 5 lety

      OK. I've added it to the to-do list. :)

  • @congchen170
    @congchen170 Před 7 lety

    Very nice video and I learned a lot from it. The only thing is when you give examples and told us when you do 10,000 times P value calculation, the distribution of P values will be like this or like that. But I don't know that's true or not. So, I am wondering can you explain a little bit more or is there any further reading I can do about P value and adjusted P value?

  • @dingdingdingwen
    @dingdingdingwen Před 2 lety +1

    Great channel and fantastic content! I am wondering if you could make an episode about IDR, Irreproducible discovery rate. It is difficult to find a good explanation or usage guide on it.

  • @ucheogbede
    @ucheogbede Před rokem +1

    This is very great!!!

  • @hedaolianxu2748
    @hedaolianxu2748 Před 4 lety +1

    AWESOME! Thank you!

  • @thomasalderson368
    @thomasalderson368 Před 6 lety +1

    thanks josh!

    • @statquest
      @statquest  Před 6 lety

      You are welcome!!! I'm glad you like the video! :)

  • @Ken-vp6xc
    @Ken-vp6xc Před 5 lety +9

    Hey thanks for the video. Just a question, don't you have higher chance of having samples that come from the middle of the distribution than the tails resulting having more large p-values than small ones? I don't get why p-values are uniformly distributed? Thanks :)

    • @statquest
      @statquest  Před 5 lety +10

      You know, I found this puzzling as well. However, imagine we are taking two different samples from a single normal distribution. If we did a t-test on those samples, 5% of the time the p-value would be less than 0.05. Now imagine we created 100 random sets of samples and did 100 t-tests. 5 of those p-values will be less than 0.05. 10 will be less than 0.1, 15 will be less than 0.15.... 50 will be less than 0.5.... 90 will be less than 0.90, etc. This isn't a mathematical proof, but it makes sense - the whole idea of having any p-value threshold, x, is that we are only expecting, x percent of the tests with random noise to be below that threshold. Thus, we have a uniform distribution of p-values.

    • @RobertWF42
      @RobertWF42 Před 5 lety

      Also keep in mind that when computing p-values for the difference between two sample means, p-values of .05 or less cover a wider range of x values than say p-values between .50 and .55.

    • @Tbxy1
      @Tbxy1 Před 3 lety +3

      @@statquest Wow, I had the same question as Ken. Thanks for giving this super intuitive explanation!

    • @lizheltamon
      @lizheltamon Před rokem

      @@Tbxy1 me too! been struggling to understand that part and thank god Ken asked 😅

  • @zijianchen4775
    @zijianchen4775 Před 5 lety

    It is a crystal clear about FDR and BH method, rather than my professor said

  • @StephenRoseDuo
    @StephenRoseDuo Před 6 lety

    Awesome, this may be too niche but could you do a video on local FDR please?

  • @thiagomaiacarneiro2829
    @thiagomaiacarneiro2829 Před rokem +2

    Great video! Congratulations. I've seen the paper of Benjamini and Hochberg 1995, but (guided by my very limited knowledge of math) I was not able to find the formula in the way you explained. Please, could you give some clarifications on this issue, as some kind of transformation of the mathematical procedure? Thank you very much. Best wishes.

    • @statquest
      @statquest  Před rokem +1

      I'll keep that in mind.

    • @donnizhang5960
      @donnizhang5960 Před 5 měsíci

      I have the same questions. Did you figure out the logic behind the mathematical procedure? Thank you!

  • @mihaellid
    @mihaellid Před 4 lety +2

    BAMMMM! Thank you!

    • @statquest
      @statquest  Před 4 lety

      Hooray! I'm glad you like the video. :)

  • @tysonliu2833
    @tysonliu2833 Před 5 měsíci +1

    I think you previously talked about how to calculate p value for one sample set that tells us how likely the sample set belongs to the distribution, but in here we are calculating the p-value of two sample sets, and try to tell whether they belong to the same distribution, how is it calculated? Or is it simply just comparing one sample set to the distribution and another and if they both likely belong to the same distribution we say we fail to reject the null hypothesis?

    • @statquest
      @statquest  Před 5 měsíci

      In this video I believe I'm using t-tests. To learn about those, first learn about linear regression (don't worry, it's not a big deal): czcams.com/video/nk2CQITm_eo/video.html and then learn how to use linear regression to compare two samples to each other with a t-test: czcams.com/video/NF5_btOaCig/video.html

  • @mihirgada5585
    @mihirgada5585 Před rokem +1

    Thanks for these videos! They are great!!
    Can you help me understand the intuition behind why the p-values are uniformly distributed in the samples from the same distribution?

    • @statquest
      @statquest  Před rokem

      Think about how p-values are defined. If there is no difference, the probability of getting a p-value between 0 and 0.05 is... 0.05. And the probability of getting a p-value between 0.05 and 0.1 is also 0.5 etc.

  • @thomasmatthew9515
    @thomasmatthew9515 Před 7 lety

    Question on the application of the B-H method: I have a distribution of p-values and KS D-values from comparing two distributions: 1) a distribution of transcriptional changes (observed), and 2) a distribution of transcriptional changes formed from random shuffling (null). I wish to adjust the p-values to weed out any false positives. When I rank the p-values, can I simply choose all p-values in the "< 0.05 bin" of the observed distribution? That kind of mimics what you did in the first example starting @ 14.47. But in the second example @ 17:07, how did you actually compute the adjust p-vales? Did you just repeat the method on the blue boxes (observed) and on the red boxes (null) separately? Thanks, and keep up the great videos!

    • @thomasmatthew9515
      @thomasmatthew9515 Před 7 lety

      That makes sense. Your approach eliminates p-value adjustment: just select a cutoff where no more than 5% of the combined (and sorted) p-values come from the permuted set. Then for any p-value from that combined set I can say "this p-value has an FDR of

    • @thomasmatthew9515
      @thomasmatthew9515 Před 7 lety

      Joshua Starmer I'll try all three and see which samples get eliminated. Thanks again for your feedback, you're more helpful than most of my professors!

  • @JadAssaf
    @JadAssaf Před 6 lety

    Thank you so much.

    • @statquest
      @statquest  Před 6 lety

      Hooray! I'm glad you like the video! :)

    • @JadAssaf
      @JadAssaf Před 6 lety

      I've been reading publications for an hour and you solved my problem in 10 minutes.

    • @statquest
      @statquest  Před 6 lety

      Awesome!!! This is definitely one of those things that's easier to "see" then to read about. Glad I could help. :)

  • @krisdang
    @krisdang Před 7 lety

    This is awesome. Imma save it for later reference hah

  • @TaylanMorcol
    @TaylanMorcol Před rokem +1

    Hi Dr. Josh, I'm curious to get your thoughts on a simulation I'm running. It's very similar to the simulation in this video where you calculate 10,000 p-values by sampling from the same distribution.
    When I run my simulation using a Welch t-test and n=3, only ~3.5% of p-values are less than 0.05. The percentage converges on 5% when I increase the sample size or use the Student's t-test.
    It seems as though forgoing the equal variances assumption sacrifices some power, especially at low sample sizes. But I'm still trying to grasp why that is and what the implications are for using the Welch t-test with low sample size in real-life situations. For example, if the null hypothesis is that both samples come from the same population, then why not just assume equal variances and use Student's t-test all the time? (I know that last question is probably conflating some concepts that should be separate, but I'm having a hard time keeping track of it all, and I'm really interested to hear how you would respond to that question).
    You seem to have a great way of explaining things like this intuitively. I'm curious to hear your thoughts.
    Thanks so much! I've benefited greatly from your videos.

    • @statquest
      @statquest  Před rokem +1

      It makes sense to me that welch's t-test has less power with low sample sizes because it makes less assumptions - and thus, has to squeeze more out of the data by estimating more parameters.

  • @TheRonakagrawal
    @TheRonakagrawal Před 8 měsíci +1

    @statquest: Josh, Thank you. I have a follow-up though. Sure, we could adjust the p-values to reduce the False positives, but could this adjustment cause an increase in False negatives? Is there a way to quantify that? Apologies if I am missing something obvious.

    • @statquest
      @statquest  Před 8 měsíci

      There are different methods to control the number of false positives, some do a better job than others at keeping the number of false negatives small. FDR is one of the best methods for limiting both types of errors. In contrast, the Bonferroni correction is one of the worst.

  • @Priestessfly
    @Priestessfly Před 3 lety +1

    great video

  • @urjaswitayadav3188
    @urjaswitayadav3188 Před 6 lety

    Great video! I have a question on distribution of p-values: I am doing a likelihood ratio test and calculating significance p-values from Chi-square test. I see that the distribution of my uncorrected p-values is not uniform near p-value 1. It has a large peak at p-value 1 i.e. most of my data-points has p-value of 1. Do you have any insights on how that might happen? And what can be the best way to correct for multiple hypothesis testing in this case. Because, using BH, I lose all the significance :( Thanks!

  • @chimiwangmo1512
    @chimiwangmo1512 Před 3 měsíci

    Thank you for the intuitive video. I am awfully new to statistics so I have three questions: Suppose it is a classification problem 1. Are "samples" referred to as "classes" (types of genes) or is it samples of genes? 2. Will the null hypothesis be: there is no dependency between the gene and the samples? 3. Why 10,000 times? (I am bit confused what is relationship between 10,000 genes and 10,000 test as I understand for each test, the distribution plot is based on values of genes)?

    • @statquest
      @statquest  Před 3 měsíci

      1) I'm not sure I understand the question because we are trying to classify the expression as being "the same" or "different" between two groups of mice or humans.
      2) The null hypothesis is that there is that all of the measurements come from the same population.
      3) When we do this sort of experiment, we test between 10,000 and 20,000 genes to see if they are expressed the same or different between two groups of mice or humans or whatever. So, for each gene in the genome, we do a test to see if it is the same or different. This allows us to identify genes that play a role in cancer or some other disease.

  • @TheJosephjeffy
    @TheJosephjeffy Před 2 lety

    I am glad to see this video as i am doing some FDR tests in my project. I have a question: what if the false positive samples remained after adjustment? Is it still acceptable if FDR is < 0.05?

    • @statquest
      @statquest  Před 2 lety

      You can not eliminate false positives, but you can use FDR to control how many there are. So typically people call all tests with FDR < 0.05 "significant".

  • @jbeebe2
    @jbeebe2 Před 6 lety

    Thanks

  • @zeyads.el-gendy4227
    @zeyads.el-gendy4227 Před 3 lety +1

    I truly love you...

  • @bzaruk
    @bzaruk Před 2 lety

    in 6:45 when you mentioned the p-value or 3 technical samples - how do you calculate a p-value of 3 technical samples into one number? do you average them before calculating the p-value? summing them up? or average the p-values of each one of the 3 technical samples?

    • @statquest
      @statquest  Před 2 lety

      I'm not sure I understand your question. However, essentially what I'm saying at that point is that we start with a single normal distribution and randomly select values from it (for details, see: czcams.com/video/XLCWeSVzHUU/video.html ), then we perform a statistical test (for example, a t-test) to calculate the p-value. We then repeat this process 10,000 times and create a histogram of the p-values. This will create a histogram of p-values for when the null hypothesis is true (for details, see: czcams.com/video/0oc49DyA3hU/video.html )

  • @karimnaufal9792
    @karimnaufal9792 Před 4 lety +1

    Holy freaking nuts!! Thank you haha...

  • @zihanyang7565
    @zihanyang7565 Před 4 lety

    Could you kindly explain the post hoc tests for ANOVA?

  • @sergiooterinosogo4286
    @sergiooterinosogo4286 Před 3 lety +1

    Thank you for your very helpful video. I have one question here: what I have understood from the calculation of the FDR is that it will make only the smaller p-values still be significant after the correction, am I right? (you suggested it in 12:09) Nevertheless, I got distracted at 17:20 because there are small-er values in the red area that, based on this, would not be "false positives" if I got your explanation. Could you clarify this? Thank you :)

    • @statquest
      @statquest  Před 3 lety +1

      The numbers in the blue boxes are p-values that were created from two separate distributions. Some of those p-values are below the standard threshold of 0.05 and some are not. The ones that are not are "false negatives". The numbers in the red boxes are p-values that were created from a single distribution. Some of those p-values are below the standard threshold of 0.0.5 and some are not. The ones below the threshold are false positives. However, in this specific example, after we apply the BH procedure (at 18:02 ), all of the false positives end up with p-values > 0.05 and are no longer considered statistically significant so the false positives are eliminated.

  • @yuyangluo7292
    @yuyangluo7292 Před 3 lety +3

    i love how he made that joke about wild type with monotone lol

  • @annas.1403
    @annas.1403 Před 5 lety

    Hey sorry to bother you (or anyone else who reads this comment) but I am currently trying to understand the connection between FDR and p-hacking. I am not sure if I understood this right but:
    Can an inflated FDR appear if researchers trying to get a significant result through multiple comparisons by running more than one independent test on the same data set.
    Or have I misunderstood FDR completely?

  • @chadmoon3139
    @chadmoon3139 Před 10 měsíci +1

    Awesome!!

  • @oliveros9
    @oliveros9 Před 6 lety +1

    1000 Thanks!
    One naive question: Why the distribution of p.values in testing samples taken from the same distribution is flat? I'd rather expected a distribution skewed towards high p.values (non significant).
    Thanks again!

    • @oliveros9
      @oliveros9 Před 6 lety

      Thanks!
      In fact, we just made a simulation (in R language) and we obtained the described behaviour (flat distribution). And it is true for any number of replicates. Your explanation is crystal clear to me. Thanks again! Nice channel!

    • @mausunk
      @mausunk Před 4 lety

      Bump, I have the exact same question