StatQuest: edgeR, part 1, Library Normalization

Sdílet
Vložit
  • čas přidán 2. 04. 2017
  • edgeR, like DESeq2, is a complicated program used to identify differentially expressed genes. Here I clearly explain how it normalized libraries.
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    CZcams Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    #statquest #rnaseq #edger

Komentáře • 59

  • @statquest
    @statquest  Před 2 lety

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @dany271197
    @dany271197 Před 5 lety +5

    By the way, you did a Great job to explain in a very nice way stastical analysis for dummies!!

  • @tkdlvk27
    @tkdlvk27 Před 4 lety +2

    wow.. this is amazing method, and your explanation

  • @elizabethblears2919
    @elizabethblears2919 Před 6 lety

    This is so helpful! Thank you! Keep up the good work!

  • @sunnetinternationalbusines9910

    Thanks for the in-depth explanation

  • @reytns1
    @reytns1 Před 6 lety +1

    Dear Joshua, I already see your video, it is really interesting and helpful for new people that are involved in this RNAseq world. Well I have a question related to normalization. Are there any relation between EdgeR with hypergeometric distribution ?

  • @liangcheng7824
    @liangcheng7824 Před 5 lety

    This is really great! I'm a little bit confused, don't people use some conserved genes that have a relative steady expression level as references to normalize their data?

  • @brettvanderwerff6917
    @brettvanderwerff6917 Před 6 lety +1

    This is amazing thanks

  • @NumptyBrainStorm
    @NumptyBrainStorm Před rokem +1

    Learning R and differential analysis for ChIP-seq differential analysis (DiffBind), THANKS!!!

  • @blankaroje8853
    @blankaroje8853 Před 4 lety +2

    Thank you!

  • @victorhigareda4716
    @victorhigareda4716 Před 3 lety +1

    The reference sample could be one of the treatments or one of the controls in one RNA-seq experiment , is it correct?. Thank you for your great explanation

  • @ElNick09
    @ElNick09 Před 3 lety +2

    This is an explanation of the process executed in TMM normalization, as made clear at 10:37. I'm just saying this in case anyone has come to this video, as i have, looking for an explanation of TMM normalization.

  • @Reza_Ghamsari
    @Reza_Ghamsari Před 4 lety +1

    This is great, thank you. I don't understand how did you calculate the weighted average? Is that just the average of log-ratios? "12:28"

    • @statquest
      @statquest  Před 4 lety +1

      I'll be honest, I made this video a while ago and haven't thought about it much since, so I can't give you any more details about how edgeR works.

    • @simonhuang4807
      @simonhuang4807 Před 2 lety

      the weights are calculated by the inverse of the approximate asymptotic variances (calculated using the delta method)

  • @c.p.8689
    @c.p.8689 Před 2 lety +1

    Love you!!

  • @garyhokawai
    @garyhokawai Před 7 lety +2

    Just wonder, comparing edgeR to DESeq2, which one makes more sense for single cell rna seq normalization?

    • @garyhokawai
      @garyhokawai Před 7 lety +1

      So if my data has a large number of zero-value genes, DESeq2 is preferable? BTW, usually I would use ERCC spike ins for the size factor calculation and apply it the endogenous ones.

  • @Adelphos0101
    @Adelphos0101 Před 4 lety

    Is there any reason for edgeR to use the 75th quantile instead of the median to pick the reference sample?
    Very nice video to understand edgeR.

    • @statquest
      @statquest  Před 4 lety

      I think the point is to just exclude outliers with excessive read counts.

  • @igumnov.daniel
    @igumnov.daniel Před 2 lety

    Ty

  • @ns43253
    @ns43253 Před 3 lety

    Do you have suggestions on whether someone should use edgeR or DeSEQ2 for 16S analysis of soil communities?

    • @statquest
      @statquest  Před 3 lety +1

      To be honest, they are about the same. However, I know Mike Love is still adding tons of new visualizations to DESeq2, so that might be my favorite.

  • @LayneSadler
    @LayneSadler Před rokem

    I'm trying to think of a reason why I shouldn't just compare the case-control distributions with: KS test pval (y axis cutoff 0.05) over difference in normalized means (x axis cutoff +/- 50 TPM). We want to know if they come from the same distribution and don't want to tiny TPM changes.

    • @statquest
      @statquest  Před rokem +1

      Unfortunately it's been way too long since I made this video or did any kind of bioinformatics work to give you a reasonable answer. However, my rough memory is that these methods (edgeR and DESeq2) gain power by pooling genes to estimate variation, and then gain more power by using a parametric test based on the negative binomial distribution. I think if you just went with a straight KS test, you wouldn't have any power.

  • @binnylinny
    @binnylinny Před 2 lety

    edgeR just seems far more complicated than DESeq2. Is there any advantage edgeR has over DESeq2, apart from the artistic signature you mentioned towards the end? :P

    • @statquest
      @statquest  Před 2 lety +1

      Not that I know of. I used to use edgeR, but switched to DESeq2 with no regrets.

  • @jesusmateoamillanocisneros6192

    Hello! Is possible make a association between environment variable and bacteria abundance? Sorry for my english!

    • @statquest
      @statquest  Před 3 lety

      I have no idea. Maybe someone else can help.

  • @dany271197
    @dany271197 Před 5 lety +1

    So you mean that EdgeR need o weighted trimmed mean normalization, but DEseq2 do not?

    • @statquest
      @statquest  Před 5 lety

      DESeq2 has it's own normalization that is similar, but a little different. Here's the link to my StatQuest that describes the method: czcams.com/video/UFB993xufUU/video.html

  • @suryakantastat0275
    @suryakantastat0275 Před rokem

    How to calculate the weights to calculate the weighted log2 ratios in this library

    • @statquest
      @statquest  Před rokem

      What time point in the video, minutes and seconds, are you asking about?

    • @suryakantastat0275
      @suryakantastat0275 Před rokem

      12:20 the weights that are assigned how they are calculated

    • @statquest
      @statquest  Před rokem

      @@suryakantastat0275 I believe edgeR uses the number of reads per gene in each sample to calculate the weighted average of the log values. For example, if we had two genes: Gene A, with 100 reads and log2()= 0.05 and Gene B, with 50 reads nad log2() = 0.1, then the weighted average would be ((100*0.05) + (50*0.1))/(100 + 50) = 0.067. For more details on how to calculate a weighted average, see en.wikipedia.org/wiki/Weighted_arithmetic_mean

  • @kimseonhoon9704
    @kimseonhoon9704 Před 2 lety +1

    12:31 I like it

  • @LayneSadler
    @LayneSadler Před rokem

    it's bananas that the top/bottom 30% of fold changes are discarded. is the reason because they prone to being +/- inf? tricky that values less than 1 lead to exploding ratios

    • @statquest
      @statquest  Před rokem

      Can you tell me what time point you're asking about (minutes and seconds)?

    • @LayneSadler
      @LayneSadler Před rokem +1

      ​@@statquest 9:47 but it appears they aren't actually dropped from the analysis, just the calculation of the scaling factor, which makes sense

    • @statquest
      @statquest  Před rokem +1

      @@LayneSadler Yep, that's correct. We just want the housekeeping genes for the scaling factor.

  • @henricker
    @henricker Před 3 lety

    I really laughed my ass off at 12:30, thanks for the video.
    To my understanding, isn't it weird that it's possible to have a reference sample for a gene where there are 0 reads on that gene? Wouldn't it be possible to take a reference sample for each gene to avoid this issue? I don't see how this makes sense logically, but I might have missed something. Thank you!

    • @statquest
      @statquest  Před 3 lety

      What time point, minutes and seconds, are you asking about?

  • @someone_there
    @someone_there Před 2 lety

    Well, fine but how to use EdgeR ?

    • @statquest
      @statquest  Před 2 lety +2

      To be honest, I found the manual for edgeR relatively easy to follow. It has a lot of examples.

    • @someone_there
      @someone_there Před 2 lety +2

      @@statquest Actually, I couldn't find any good workflow tutoriel for EdgeR on youtube, with like coding explanations, etc. if you have time to publish a good video about that, it would be extremely helpful.

    • @statquest
      @statquest  Před 2 lety +2

      @@someone_there I wish I could, but it's been years since I used edgeR. :(

    • @someone_there
      @someone_there Před 2 lety +2

      @@statquest oh I see... well, thanks a lot for your answers anyway :)

  • @sunnetinternationalbusines9910

    Edge R seems to make more sense than DESEQ2 to me.