ANOVA: Crash Course Statistics #33

Sdílet
Vložit
  • čas přidán 2. 07. 2024
  • Today we're going to continue our discussion of statistical models by showing how we can find if there are differences between multiple groups using a collection of models called ANOVA. ANOVA, which stands for Analysis of Variance is similar to regression (which we discussed in episode 32), but allows us to compare three or more groups for statistical significance.
    Crash Course is on Patreon! You can support us directly by signing up at / crashcourse
    Thanks to the following Patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:
    Mark Brouwer, Kenneth F Penttinen, Trevin Beattie, Satya Ridhima Parvathaneni, Erika & Alexa Saur, Glenn Elliott, Justin Zingsheim, Jessica Wode, Eric Prestemon, Kathrin Benoit, Tom Trval, Jason Saslow, Nathan Taylor, Brian Thomas Gossett, Khaled El Shalakany, Indika Siriwardena, SR Foxley, Sam Ferguson, Yasenia Cruz, Eric Koslow, Caleb Weeks, D.A. Noe, Shawn Arnold, Malcolm Callis, Advait Shinde, William McGraw, Andrei Krishkevich, Rachel Bright, Mayumi Maeda, Kathy & Tim Philip, Jirat, Ian Dundore
    --
    Want to find Crash Course elsewhere on the internet?
    Facebook - / youtubecrashcourse
    Twitter - / thecrashcourse
    Tumblr - / thecrashcourse
    Support Crash Course on Patreon: / crashcourse
    CC Kids: / crashcoursekids

Komentáře • 164

  • @phlippindolfy
    @phlippindolfy Před 5 lety +1059

    I'm here in the deserate hopes that this will help me understand stats after a full semester of classes.

  • @Xman3456
    @Xman3456 Před 5 lety +713

    I appreciate the effort into making the video, but I was a bit overwhelmed by all the graphics and the speed of the explanations.

  • @genericruler
    @genericruler Před 5 lety +308

    5:00 The distance between each point and it's group mean is the residual error (SSE). The SSM would be the difference between each group mean and grand mean.

    • @genericruler
      @genericruler Před 5 lety +21

      10:36 SSE (Residual) should be (Xi - Xbar_group) in squared term?

    • @soulfrench
      @soulfrench Před 5 lety +51

      You are totally right. I have taken a look at my statistics text book and it says that SST(Total sum of squares) = SSM + SSE, and SSM is calculated by the (difference between each group mean and the grand mean)^2 * (Total number of categories). I am surprised to notice two points. The first thing is that crash course made this kind of huge mistakes when explaining ANOVA and the second is that no one actually notices that (only based on this comment section) except you, Ronan

    • @soulfrench
      @soulfrench Před 5 lety +52

      The explanation for the SSM and SSE are exactly the same on this video, which means that either of one is wrong

    • @patrickjane5796
      @patrickjane5796 Před 5 lety +17

      Had the same doubt, checked the comments for confirmation, and found your reply. Thank you!

    • @andriistadnik6775
      @andriistadnik6775 Před 5 lety +8

      I guess we have to contact crash course team somehow to tell them about that, in the case of other people will rely on those videos

  • @llamafromspace
    @llamafromspace Před 5 lety +486

    Today I learned that the word ANOVA exists, and that I shouldn't jump halfway into a course.

    • @swampertblaziken1
      @swampertblaziken1 Před 5 lety +4

      ANOVA - Analysis of Variance

    • @rafnaegels8913
      @rafnaegels8913 Před 5 lety +2

      This + 9000!!!

    • @lichbanelb
      @lichbanelb Před 5 lety +7

      This is a 2nd year statistics topic (at least at my uni), so its not easy!

    • @LiteralCats
      @LiteralCats Před 5 lety +4

      Lol. I'm halfway through the video, and this is what I learnt so far xD *goes back to earlier videos*

    • @lokeshsah7
      @lokeshsah7 Před 5 lety +2

      I got the same lesson 😂😂

  • @anastasiostresinis6899
    @anastasiostresinis6899 Před 4 lety +16

    This presentation helped me gain further insight to ANOVA. Wish I had you (this goes to the whole cast) as a stats teacher! Big THANKS!!!

  • @NavajoMX
    @NavajoMX Před 5 lety +2

    Thank you! I've needed this episode for years.

  • @Noob___Noob
    @Noob___Noob Před 5 lety +168

    ANOVA, I learned it, being tested on it, aced it, but really didn't understand what it is.

    • @Jasuta123
      @Jasuta123 Před 5 lety +2

      Yah me too ... The software is too complicated...

    • @voltairesarmy6702
      @voltairesarmy6702 Před 5 lety +2

      I took two classes using ANOVA b4 learning (on my own) the connection to GLMs lol

    • @teunvandenbrand1324
      @teunvandenbrand1324 Před 5 lety +12

      The intuition I have about ANOVA is that it tests wether the variance between groups exceeds the variance within groups. Maybe that could help

    • @oldcowbb
      @oldcowbb Před 5 lety +6

      basically my college life

  • @rhiwright
    @rhiwright Před 5 lety +53

    I am so glad SPSS does most of the work and I'm just learning to interpret the results. Interesting that you moved onto ANOVA at the same week my quantitative and qualitative research methods module at uni did :)

  • @beatitudesteffen9615
    @beatitudesteffen9615 Před 4 lety +1

    You are an incredibly eloquent speaker. Thanks for this explanation!

  • @km1dash6
    @km1dash6 Před 5 lety +38

    I'm a grad student studying psychology. In a couple weeks, I have to take a class on ANOVA. This really helps.

  • @rib_rob_personal
    @rib_rob_personal Před 5 lety +22

    Wow. These are coming out right when I need them lol. Taking a hard statistics course.

  • @douglasmaxwell6547
    @douglasmaxwell6547 Před 5 lety +1

    Brilliant video, thanks for sharing.

  • @desankad.870
    @desankad.870 Před 4 lety +16

    You make statistics so understandable and not as abstract! I am not so scared of t-tests, z-tests, F-tests and ANOVA anymore! Why do most statistics teachers make it seem so scary? Statistics is great! (esp for curious minds like myself ^_^)

  • @kimberlyt7986
    @kimberlyt7986 Před 8 měsíci

    Thank you Adriene for helping me pass this class 🙏

  • @francis112233445566
    @francis112233445566 Před 4 lety +36

    Great video, although I wouldn’t recommend running three t-tests after the ANOVA without first applying the Bonferroni correction! This is using an alpha level of 0.05/the number of comparisons you’re making (in this case 3). Use this corrected alpha level to determine significance, otherwise you may run into family-wise problems and make a type 1 error

  • @tiffanyszymanski5956
    @tiffanyszymanski5956 Před rokem

    This video was super helpful!! Thank you!!! ❤

  • @ismellcakes1
    @ismellcakes1 Před 4 lety

    Simply amazing. Thank you!

  • @Ureyeuh
    @Ureyeuh Před 4 lety +5

    This information is spit out insanely fast.

  • @adamheckenberg5861
    @adamheckenberg5861 Před rokem +11

    This is incredible. Taking Stats for Psyc right now, getting a lot harder as it goes on. Thank you!

  • @JEOGRAPHYSongs
    @JEOGRAPHYSongs Před 5 lety +9

    NOVA has been one of my favorite PBS programs for 3 decades now.

  • @sansm5285
    @sansm5285 Před 4 lety +2

    Hey, I love these series, it's helping me through a semester of Corona-Statistics.
    I just think I might have found a mistake at 10:57 for the model sum of squares, because the sum should go from i=1 to k, instead of to n as the figure says.

  • @kiou97
    @kiou97 Před rokem

    For the first example and the slope calculation, it should be the opposite (μ1-μ0) in the numerator. Just for avoiding any confusion with regard to the code names for rainy and non-rainy days. Thanks Crash Course team for all your efforts and teaching, and thanks Adriene for this particular course which ,for me as an engineer, was a tough lesson for all those years that I was avoiding Statistics courses 😅

  • @amulyagupta9161
    @amulyagupta9161 Před 4 lety +13

    I love crash course videos for their simplicity but couldn't make out much from this one

  • @albyv.4209
    @albyv.4209 Před 4 lety +1

    I have my intro to bio stats final tomorrow and these videos are my 3 am Hail Mary half court shot.

  • @danielduvernay3207
    @danielduvernay3207 Před 5 lety +2

    omg love this video

  • @taylorharris8078
    @taylorharris8078 Před 5 lety +16

    I love crash course! But this is not an introductory level video. There are others that explain anova more simply

  • @zackwise1852
    @zackwise1852 Před 5 lety +32

    I really love all of crash course's content, but I've been having a tough time following this series. After rewatching this episode and the previous one multiple times, I'm still confused. In this episode, but SSM and SSE are both described as the sum of squares between each point and its group mean, but SSM and SSE are different! If someone could explain this to me I'd really appreciate it.

    • @Mr_Wallet
      @Mr_Wallet Před 5 lety +4

      This is the only CC series to date that seems to be geared very specifically at being a class supplemental and not necessarily accessible to someone only watching the videos (although Engineering is also skirting the line a little bit). It's been fairly disappointing.

    • @chelseaparlett8069
      @chelseaparlett8069 Před 5 lety +11

      I'm sorry if there was an error.
      SSE is the sum of the squared distance between each point and its group mean (more generally it's the distance between the data point and the predicted value).
      SSModel is the sum of squared distance between the model prediction and the grand (overall) mean.

    • @zackwise1852
      @zackwise1852 Před 5 lety +1

      @@chelseaparlett8069 Thanks for the clarification, I think I understand now :)

    • @lenamaas9233
      @lenamaas9233 Před 5 lety

      @@chelseaparlett8069 thank you!

    • @winnieb3324
      @winnieb3324 Před 4 lety

      @@chelseaparlett8069 Is the predicted value in SSE basically the predicted mean?

  • @aaronmarks9366
    @aaronmarks9366 Před 5 lety

    My favorite statistics documentary series is PBS ANOVA

  • @Jesusiscomingback
    @Jesusiscomingback Před 4 lety

    I love Indiana, my family is from there. I’m hype. Love u guys. Thanks for your help. You guys literally help me in every class I have. I go to college online. CTU online. Thanks guys for real.

    • @sudeepjoseph69
      @sudeepjoseph69 Před 4 lety

      Array! Noru moosko ra pandhi! Epudu matladuthanai untavu. Konchuma brathakamu neruchuko.

  • @olgaalejo8550
    @olgaalejo8550 Před 5 lety +1

    Thank you!!!

  • @Grv28097
    @Grv28097 Před 5 lety +1

    You are a life saver!

  • @lianggegou
    @lianggegou Před 5 lety +45

    I love the examples but this goes way too fast, I had a hard time following the explanations 😢

    • @greensteve9307
      @greensteve9307 Před 5 lety +3

      Just watch it on x0.75 then, or pause it and go back.

  • @user-ht4vw2wo4h
    @user-ht4vw2wo4h Před 4 lety

    Thank You ❣!

  • @CMunkMunk
    @CMunkMunk Před 5 lety +117

    Hi graphics team, ß ≠ β 😉

  • @NeilNileStudios
    @NeilNileStudios Před 5 lety

    Cool, Hill is back. I liked her in econ

  • @nightsazrael
    @nightsazrael Před 5 lety +3

    A bunny preserve how cool. Also I really have to think hard to understand your videos, but it is always worth it. I never gamble, but life is a gamble and statistics are one of the best ways to make a decision. Not always the right decision, but random chance rules the world.

  • @dinomoviesnstuff
    @dinomoviesnstuff Před 5 měsíci +3

    Very confusing.

  • @maftoumiali4412
    @maftoumiali4412 Před 4 lety

    You're amazing

  • @IamMathenge
    @IamMathenge Před rokem

    thank youapparently i am understing this 1 year after campus into data science

  • @MasterofPlay7
    @MasterofPlay7 Před 5 lety

    So if the mean of one or more groups (are skewed by outlier or missing values), is anova's result between the groups still valid? Since the parameters for anova is the variances

  • @researchtech5830
    @researchtech5830 Před 5 lety

    NIce explaination..

  • @kylehenderson9489
    @kylehenderson9489 Před 5 lety +10

    YES. There is largely unpalatable chocolate. I've eaten some.

  • @caitlincunningham8944
    @caitlincunningham8944 Před 4 lety

    Would there be a point in doing an ANOVA for two groups, or would it be easier to just do a T-test?

  • @stephenlippi5724
    @stephenlippi5724 Před 5 lety +54

    You can't run multiple T-tests... this inflates the rate of Type I error!

    • @gardenhead92
      @gardenhead92 Před 5 lety +9

      They said they'll address that in a future episode

    • @stephenlippi5724
      @stephenlippi5724 Před 5 lety +18

      That's definitely good because as soon as they said "just run 3 t-tests" I almost fell out of my chair. Doesn't help those students watching this who are now like "oh just run t tests!"

    • @doonce
      @doonce Před 5 lety +17

      Ya, you have to do a post-hoc test like Tukey. Otherwise, there's no point in doing the ANOVA in the first place, just do t-tests.

    • @lakudzala195
      @lakudzala195 Před 5 lety +3

      If you were doing this study would it be better to do all 3 t-tests with the bonferroni correction and present all 3 in a paper, or find the t-test that shows the strongest result and only present that one?

    • @voltairesarmy6702
      @voltairesarmy6702 Před 5 lety +1

      @@lakudzala195 I don't remember the bonferroni(spelling?Lol) correction but just wanted to say, there's a push for presenting confidence intervals in papers. So, regardless of what you end up doing, I suggest using confidence intervals. Also, maybe look for it on google scholar (also related topic: replication in science). (I'm assuming this is a scientific study of some kind. )

  • @user-mf9wy6rs6t
    @user-mf9wy6rs6t Před 4 lety +3

    There was one big mistake....never talk about chocolate in maths 😄
    I was not able to think about calculations but chocolate.
    Overall was pretty clear:)

  • @jamicub39
    @jamicub39 Před 5 lety +1

    Is it a bumpy or slippery slope? Si there's a variable difference.

  • @kierannurmi5488
    @kierannurmi5488 Před 5 lety +1

    Is there any situation where an F test would say not statistically significant but a T test would? The fact that you said a failed F test means a relationship "probably" doesn't exist seems to imply that it can. What would you do in that case?

  • @tohtine
    @tohtine Před 5 lety +7

    I think your explanation for the model sum of squares is incorrect; it should be the sum of squared differences between group means and the overall mean.

  • @INSPirrationalNATURE
    @INSPirrationalNATURE Před 4 lety +3

    You're gonna save my master's degree *.*

  • @ikahn17
    @ikahn17 Před 5 lety +1

    I thought this was going to be about my sous vide circulator lol

  • @zeio-nara
    @zeio-nara Před 5 lety +2

    It sounds like SSR and SSE are the same thing

  • @loganl3746
    @loganl3746 Před 5 lety

    Yeah, but which potato varieties did best in Martian soil, supplemented with human manure and bacteria cultures?

  • @himanshukhandelwal9226

    Is it a complete course on statistics..I mean.. does it includes most of what we need to know about statistics..?

  • @grainfrizz
    @grainfrizz Před 5 lety +1

    Is it right to say that ANOVA is the same as T-test but the former is when you have more that 2 groups?

    • @voltairesarmy6702
      @voltairesarmy6702 Před 5 lety +4

      It's right that the ANOVA is used for cases where a t-test is inappropriate/inadequate because there are more than two groups to compare.

  • @jonathanblackwell42
    @jonathanblackwell42 Před 5 lety +7

    ANOVA beat me up in stats class...

  • @EmilyTotallynotbees
    @EmilyTotallynotbees Před 4 lety

    I wanna walk through a bunny preserve to work 🥺

  • @vegangelo_29
    @vegangelo_29 Před 4 lety +1

    So instead of using ANOVA, why not just use multiple T-test?

  • @BlezzBeats
    @BlezzBeats Před 4 lety

    ANOVA is a great tasting chocolate bean.

  • @NamithaMariaCherian
    @NamithaMariaCherian Před rokem

    When you are calculating, SSM- it is the difference between the overall mean and the mean of each group. SSE- is the difference between observed data and the group means. The SSM is explained incorrectly in the video. But otherwise, great content. Thank you.

  • @JasonOlshefsky
    @JasonOlshefsky Před 5 lety

    Is there a variation of GLM that relies on median rather than mean? I kind of doubt it because it doesn't work mathematically ... but I have read that medians are a "more accurate" measure of "typical" than means. For instance, in the bunnies example, if one day the sanctuary sent all the bunnies outside on a sunny day and you saw 30 bunnies, it would skew your 1-or-5 general model strongly.

    • @soulfrench
      @soulfrench Před 5 lety

      Hey, General linear model and Generalized linear model(GLM) are two different things.

    • @teunvandenbrand1324
      @teunvandenbrand1324 Před 5 lety +2

      Most of the time you could take a non-parametric test over a parametric test if you're concerned that your data doesn't follow a theoretical distribution. Non-parametric tests are often based on rank. The good thing is that they are robust, the downside is that you lose some statistical power.

  • @emilyneufeld673
    @emilyneufeld673 Před 5 lety +1

    K. I have a question... why exactly do you hate the ever so extraordinary SPONGE???

  • @raeidm.raunak4927
    @raeidm.raunak4927 Před 5 lety +1

    Do a crash course history on the Bangladeshi war of independence in 1971. I have a project and would love of you do a video on it.

  • @voltairesarmy6702
    @voltairesarmy6702 Před 5 lety +1

    Since it's a Kaggle dataset, did you use R or python to analyze the data? Or did you download it and use Excel, SPSS, Stata, SAS, etc to analyze the data?

  • @SolSystemDiplomat
    @SolSystemDiplomat Před 5 lety +3

    I like cookies

  • @StevenVenette
    @StevenVenette Před 5 lety +1

    Cacao bean difference here is an example of the danger of significance testing. I would argue that a mean difference of .17, on the scale being used, is not meaningful.

    • @gardenhead92
      @gardenhead92 Před 5 lety +1

      I think most people would agree, which is why you should always present your effect size along with your p-value :)

    • @voltairesarmy6702
      @voltairesarmy6702 Před 5 lety

      Also, presenting confidence intervals is a good idea!

    • @teunvandenbrand1324
      @teunvandenbrand1324 Před 5 lety

      Also the ratings are on an ordinal scale, not a continuous ones (as seen by the discrete values the ratings can take). So applying a non-parametric test might be more useful.

    • @voltairesarmy6702
      @voltairesarmy6702 Před 5 lety

      @@teunvandenbrand1324 well if we care about that, an ordered logit / probit would work. C:

  • @danconrad920
    @danconrad920 Před 5 lety +1

    Unpalatable chocolate?
    Yeah,...it's called carob

  • @user-qh2ki2rz4m
    @user-qh2ki2rz4m Před 6 měsíci

    9 grand a year to learn more off off of a 5yr old CZcams playlist than in my Stats lectures... (I have an exam on this and I am so screwed)

  • @mohamedaitkhouyamouh5599

    this letterally bettar than sharing the bad with my gf ,thank you so much for the work absolutely mind-blowing

  • @alexmarvin3093
    @alexmarvin3093 Před 4 lety

    Adrian Hill is the best no one compares

  • @toniisaurVODS
    @toniisaurVODS Před 5 lety +1

    Using ordinal data is a bad example with the cocoa bean type. You cant use the mean as a measure of central tendency when it has no meaning i.e.what is the average of strongly agree and disagree? Also really bad idea to teach doing multiple t-tests as it increasing the Type I error, and defeats the whole point of ANOVA. Would have been better to show Tukey’ HSD to determine which means are different.

  • @dr.jackauty4415
    @dr.jackauty4415 Před 5 lety

    Bunny count would not be Gaussian. Probably Poisson or negative binomial.

  • @andresmc210
    @andresmc210 Před 5 lety

    Please feature bunnies more often.

  • @liamc3995
    @liamc3995 Před 5 lety

    This isn’t John Green.

  • @Stoic_Panda
    @Stoic_Panda Před 5 lety +1

    wait was this a 2 way or 1 way ANOVA? Lol what is the difference?

  • @ezhilarasankandaswamin4339

    Can you sugest good book to follow crash course series & further practise

    • @voltairesarmy6702
      @voltairesarmy6702 Před 5 lety

      Open Intro Statistics is decent. It's free to get an ebook and has decent resources too. I used it in a class lol

    • @ezhilarasankandaswamin4339
      @ezhilarasankandaswamin4339 Před 5 lety

      @@voltairesarmy6702 thanks i will start to download from web
      Can you give your suggestion e-book

  • @Cormac_YT
    @Cormac_YT Před 5 lety +4

    *NOTIFICATION SQUAD WHERE YOU AT? 🔥💯💪*

  • @anikamaynard8132
    @anikamaynard8132 Před 5 lety +2

    this doesnt make anova easy to understand at all. it doesn't take into consideration that people are now learning this whole concept...

  • @unleashingpotential-psycho9433

    Statistics is way better than geometry.

  • @sudeepjoseph69
    @sudeepjoseph69 Před 4 lety +1

    This series has the lowest viewership compared to all other series in cc

  • @RohaZahidi
    @RohaZahidi Před 5 lety +10

    Even though I've taken an entire semester worth of classes on statistics, these videos are actually even more confusing. You guys focus too much on keeping the videos short and end up explaining nothing at all. there is information and you make a few good points but its nothing one cant get from a regular math website. the visuals are a waste and it all seems pretty forced and like youre just reading off a screen.

  • @BigYellowJoint1
    @BigYellowJoint1 Před 5 lety +8

    Just use SPSS

  • @iefe65
    @iefe65 Před 5 lety

    If we can know exactly the statistical significance between different groups by using t tests for every 2 groups, why even bother with the f-test in the first place lol ?

  • @jonathandominguez300
    @jonathandominguez300 Před 5 lety +1

    Hey! Explain the story of Scheherazade. Pwease.

  • @nareshchinnam8349
    @nareshchinnam8349 Před 4 lety

    Very difficult to follow with this speed of the explanation.

  • @nytmare3448
    @nytmare3448 Před 5 lety

    DFTBAQ (hey did you can type DFTBAQ with your left hand only?)

  • @harrygroundwater2590
    @harrygroundwater2590 Před 8 měsíci +1

    Anyone here from ANU?

  • @fruitninja8475
    @fruitninja8475 Před 4 lety

    still can't get it. I'm an idiot. sorry.

  • @sarocturtlegaming7306
    @sarocturtlegaming7306 Před 4 lety +1

    yup nope still confused I miss the dude D: TAKE ME BACK TO SCIENCE

  • @coolhaddool3680
    @coolhaddool3680 Před 5 lety +2

    والله مدري ايش بتقول دي

  • @ninasimoneh4030
    @ninasimoneh4030 Před 4 lety

    Omg I think I'm worse off..

  • @Blubgamer
    @Blubgamer Před 5 lety +1

    bro!

  • @mamasophie8597
    @mamasophie8597 Před 4 lety

    UMMMMMMM AM I AN IDIOT OR HOW DO U CALCULATE THE P-VALUE??????

  • @dyngjean4532
    @dyngjean4532 Před 5 lety +1

    Lol what people will do for the first comment...

  • @andreaqui1653
    @andreaqui1653 Před 4 lety

    this didn't make sense at all fam

  • @mzms4l422
    @mzms4l422 Před 5 lety +1

    1st

  • @rachaelharwood9063
    @rachaelharwood9063 Před 4 lety

    This is way too fast

  • @birdygamer5224
    @birdygamer5224 Před 5 lety +1

    I think I might understand this a little better if she used a video game examp!e

  • @ZIlxIM
    @ZIlxIM Před 5 lety +1

    .

  • @KristopherStockholm
    @KristopherStockholm Před 5 lety +1

    First

  • @danielmclaughlin5573
    @danielmclaughlin5573 Před 5 lety

    Yes. Of course there is unpalatable chocolate out there. It's called chocolate.