Video není dostupné.
Omlouváme se.

Simple Linear Regression: Checking Assumptions with Residual Plots

Sdílet
Vložit
  • čas přidán 4. 12. 2012
  • An investigation of the normality, constant variance, and linearity assumptions of the simple linear regression model through residual plots.
    The pain-empathy data is estimated from a figure given in:
    Singer et al. (2004). Empathy for pain involves the affective but not sensory components of pain. Science, 303:1157--1162.
    The Janka hardness-density data is found in:
    Hand, D.J., Daly, F. , Lunn, A.D., McConway, K., and Ostrowski, E., editors (1994). The Handbook of Small Data Sets. Chapman & Hall, London.
    Original source: Williams, E.J. (1959). Regression Analysis. John Wiley & Sons, New York. Page 43, Table 3.7.

Komentáře • 148

  • @48956l
    @48956l Před 8 lety +152

    I'M GIVIN THIS VIDEO THE BIG CHECK MARK

  • @user-pl7zr2jm5h
    @user-pl7zr2jm5h Před 5 měsíci +4

    this was posted 11 years ago T-T and has the best explanations and videos on statistics I have ever found, thank you so much for all your hard work and legacy, i hope you know you're my savior.

    • @jbstatistics
      @jbstatistics  Před 5 měsíci +3

      I'm glad to be of help! 11 years, where'd they go? :)

  • @jbstatistics
    @jbstatistics  Před 11 lety +50

    "think you guys should get more views..."
    Thanks! (And I'll take as a compliment that you said "you guys", since this is a one man show.) Getting lots of views isn't very high on my priority list -- I'm just trying to provide the best resources for my students that I can. (I haven't done any promotion, and I don't allow ads on the videos.)
    There are many students in intro stats in North America and around the world, and I'm glad that some of them find my videos helpful.

    • @williamlee0
      @williamlee0 Před 4 lety +1

      I'd upvote you x10 if I could just for the anti-advert policy.

  • @nkululekoshabane3373
    @nkululekoshabane3373 Před 9 lety +40

    One of the best, if not the best, video on regression analysis I've seen. Thank you very much for creating it. Your service is highly appreciated.

    • @jbstatistics
      @jbstatistics  Před 9 lety +4

      Nkululeko Shabane You are very welcome, and thank you very much for the compliment!

  • @GuppyPal
    @GuppyPal Před 2 lety +13

    This is exactly what I have needed. My professor goes over these plots but has been doing statistics at a high level so long that I think it's hard for him to relate to someone who is new to it. I really needed someone to just explain it all from start to finish, and you did that. Thank you so much! Your videos are so, so helpful. Sincerely, a first year statistics graduate student.

  • @doodelay
    @doodelay Před 5 lety

    "The residual plot removes that increasing trend and then re-scales the y axis, so it's a little bit easier to see these issues.. sometimes in the residual plot." Now that is some serious insight. Thank you so much and this video was superb with really excellent examples!

  • @jbstatistics
    @jbstatistics  Před 11 lety +2

    You are very welcome Simon!

  • @snake1625b
    @snake1625b Před 8 lety

    Excellent methods used to help students learn in this vid. This is the future of education!

  • @valeriereid2337
    @valeriereid2337 Před rokem +1

    Thank you for this excellent lecture. It certainly helps.

  • @raseshgupta6276
    @raseshgupta6276 Před 2 lety

    I was struggling to understand the assumptions in simple linear regression through other sources. This video has made it clear

  • @jbstatistics
    @jbstatistics  Před 11 lety +1

    I'm glad you find them useful John. Best of luck in your course!

  • @vasili111
    @vasili111 Před 10 lety +1

    Very good videos about simple linear regression. Thank you very much for creating them!

  • @johncasey722
    @johncasey722 Před 11 lety +1

    I'm so fricking glad these videos align well with my UIUC stats class. Much appreciated!

  • @Maha_s1999
    @Maha_s1999 Před 7 lety +1

    Prof Balka knocks it out of the park every time! We miss your videos. Could you do some videos on multiple linear regression? Hope you come back soon with new vids!

    • @jbstatistics
      @jbstatistics  Před 7 lety

      Thanks for the compliment! I'm trying to make time for video production, but probably won't get back to it until the new year. It's been a busy few years, but returning to the videos has always been part of the plan (with multiple regression videos up near the top of the list). Cheers.

    • @Maha_s1999
      @Maha_s1999 Před 7 lety

      YAY!! Thanks Prof !! I will look out for them.

  • @deniskapliy2642
    @deniskapliy2642 Před 7 lety

    Small...and then they're big...and then they're small...and then they're big..
    Great video, pretty simplistic, but very useful, thank you!

  • @shayd146
    @shayd146 Před 10 lety +2

    JB thank you so much you have helped me more than you'll ever know! My only suggestion to you would be to create playlists for associated topics. Other than that your teaching methods are incredible! Thanks!

    • @jbstatistics
      @jbstatistics  Před 10 lety +1

      Thanks very much for the compliment Shaydoyle! I believe I do have playlists ordered by topic. I've also set up a website (www.jbstatistics.com), which keeps the videos in a more organized fashion. (I'm not plugging anything on the site - it's just organized lists of my videos.) Cheers.

  • @jbstatistics
    @jbstatistics  Před 11 lety

    We often simply rely on an appropriate sampling design or experimental design to ensure independence. But if, say, we have recorded the observations in some sort of time order, then plots of the residuals through time can give us some indication of whether the residuals are correlated.

  • @dedraryqui5606
    @dedraryqui5606 Před 8 lety +1

    very clear, easy understandable video

  • @muhammadusama1558
    @muhammadusama1558 Před 4 lety

    The more I watch your video, the more I hate my uni. Much love man

  • @hritwick1221
    @hritwick1221 Před 3 lety

    you are great man . thanks for your content . I am forever great full to you .

  • @rodrigopaolinelli6448
    @rodrigopaolinelli6448 Před 2 lety

    This is a definitely a great video, thank you! You are awesome!

  • @carnationize
    @carnationize Před 5 lety

    Thanks a lot! All your videos on stats are very clear and have been very helpful!

  • @hichamitani6433
    @hichamitani6433 Před 2 lety

    Thank you
    Need more like these videos on outliers in residuals

  • @Jelly-cy4vh
    @Jelly-cy4vh Před rokem

    This was very useful, thank you for all the information

  • @MohamedAbdo-xs7bf
    @MohamedAbdo-xs7bf Před 5 lety

    You are Awesome! Thank you so much for sharing your valuable knowledge.

  • @rahkshi96
    @rahkshi96 Před 8 lety

    Thank you very much jb statistics. This is incredibly helpful and well explained.

    • @jbstatistics
      @jbstatistics  Před 8 lety

      +Peter Song You are very welcome. Thanks for the compliment!

  • @bharathganeshkumar7071

    Thanks for their video.. Short and sweet...!!!

  • @mostafaali8684
    @mostafaali8684 Před 8 lety +1

    Good video, thank you very much for uploading it.

    • @jbstatistics
      @jbstatistics  Před 8 lety

      +Mostafa Ali You are very welcome. I'm glad you found it useful!

  • @jamiebond8481
    @jamiebond8481 Před 7 lety

    good and simple explanation of residual plots and assumptions.

  • @bibekanandasahoo3497
    @bibekanandasahoo3497 Před 2 lety

    thanks for this great explanation sir .....

  • @Pavankumar-zw2fz
    @Pavankumar-zw2fz Před 4 lety

    Very good Explanation Sir.Thank You

  • @linneajohansson3796
    @linneajohansson3796 Před 3 lety

    This was very helpful! Thank you!

  • @TB3hnz
    @TB3hnz Před 4 lety +2

    4:12 "I'm giving this the joker variance, because *let's put a SMILE on that FACE!* "

  • @angelinelam5862
    @angelinelam5862 Před 4 lety

    Thank you for this useful video !

  • @syedahmedali7417
    @syedahmedali7417 Před 4 lety

    you are such a great teacher...

  • @ananyapamde4514
    @ananyapamde4514 Před 3 lety

    Great video!

  • @savageprincess2796
    @savageprincess2796 Před rokem

    im giving this video A BIG CHECK MARK (2)

  • @jingwen8133
    @jingwen8133 Před 4 lety

    Very useful video ! Thank you

  • @Jemimakl
    @Jemimakl Před 5 lety +2

    So helpful! Thank you for this :)

  • @aayushiagarwal6188
    @aayushiagarwal6188 Před 4 měsíci

    Perfectly explained ✨️
    Could you please let me know, if the white centre line (the one around which all the ebsilon points are there) is itself not straight and showing a pattern ,what do we interpret? Does this mean that the mean of errors is non zero and hence our assumption is contradicted?

  • @wenlidi1604
    @wenlidi1604 Před 8 lety +1

    very clear explanation.

  • @JoaoVitorBRgomes
    @JoaoVitorBRgomes Před 4 lety +1

    At 1:56 you can't plot against Y because there is dependence between Y and the residuals? You mean the residuals are the difference between the observed and the estimated, so makes no sense to plot against the observed? But why? Could you clarify this?

  • @dylanburns9381
    @dylanburns9381 Před 7 lety +1

    great video. such a clear explanation. subbed.

  • @willtube9
    @willtube9 Před rokem

    Prof, Could you do some videos on multiple linear regression? Hope you come back soon with new vids!

  • @sivanschwartz3813
    @sivanschwartz3813 Před 8 lety +1

    thank you for this amazing video!!!!!!

  • @hanaizdihar4368
    @hanaizdihar4368 Před 3 lety

    this really helps, thank you

  • @yingdili2219
    @yingdili2219 Před 4 lety

    perfect video

  • @bhabeshmahanta3408
    @bhabeshmahanta3408 Před 5 lety

    Very nice teaching. Thanks

  • @Riley8185
    @Riley8185 Před 6 lety

    These are very good videos

  • @frederickrosas5248
    @frederickrosas5248 Před 2 lety +1

    Hi Sir. May I know what statistical tests/treatments being used in residuals plots to confirm what is allowed and not? Thank you for your help.

  • @DHDH_DH
    @DHDH_DH Před 6 měsíci

    Still extremely helpful in 2024

  • @CHIRAGPERLA
    @CHIRAGPERLA Před 5 lety

    This is gold!

  • @Stephanbitterwolf
    @Stephanbitterwolf Před 6 lety

    Great video! Thank you!

  • @pubgvulcanizer7857
    @pubgvulcanizer7857 Před 3 lety

    Very nicely explained 👍

  • @frederikhe707
    @frederikhe707 Před 7 lety

    Nice! The only improvement I would suggest is that you actually name the violated assumptions. I mean people can draw that conclusion on their own but that would make it even more clear.

  • @davidli6068
    @davidli6068 Před 4 lety

    thanks a lot your a king

  • @aabinamasoodgundroo5971
    @aabinamasoodgundroo5971 Před 2 lety +1

    my graph is blank, what does that mean?

  • @infoesenn
    @infoesenn Před 3 lety

    Question: Why do you assume normally distributed errors? From my understanding, in large samples iid-errors with from any distribution should be sufficient (Central Limit Theorem).

  • @pate1495
    @pate1495 Před 3 lety

    I have a question regarding the Normal Q-Q plot. On the y-axes, does it show the quantiles of the residual distribution, or the residuals itself? On the x-axes it shows the quantiles of the residual distribution if it were normal, correct? Thank you, great video!

    • @jbstatistics
      @jbstatistics  Před 3 lety

      There are different ways of formatting these plots, but here I have the ordinary residuals on the y axis. (The y axis value for any point is the ordinary residual of that point.) Any value could be considered a quantile. The x axis represents the corresponding quantile from the standard normal distribution. So if the residuals were normally distributed, we'd expected those values to fall (roughly) in a linear pattern. (There are some technical issues here, as the observed residuals aren't technically iid normal, even if the OLS assumptions are true, but it's a rough approximation.)

  • @purityrima1366
    @purityrima1366 Před 4 lety

    Please can you share me a link with your video on how to correct the unequal change in variance problem shown on the plots. Thanks in advance

  • @sanjaypandey6586
    @sanjaypandey6586 Před 2 lety

    is it ok in linear regression if dependent and independent variable are not normally distributed if not what should be the optimum solution for negative skew and neg kurtosis

  • @renshiue
    @renshiue Před 2 lety

    nice and clear

  • @siryohannb3626
    @siryohannb3626 Před 3 lety

    thankyou very much

  • @JoaoVitorBRgomes
    @JoaoVitorBRgomes Před 4 lety +1

    3:23 what kind of graph indicates non normality?

  • @carlosaugusto212
    @carlosaugusto212 Před 4 lety

    Shouldn't we analyse the standardized residual plot? I mean, the residuals will be naturally bigger as the y value gets bigger, won't it? If the y range goes from 0,1 to 10 thousand, we expect bigger residual absolute values near the 10 thousand mark. Correct me if I'm wrong, please

  • @ohhrelingo6271
    @ohhrelingo6271 Před 2 lety

    If I can't find out if the variance is constant from the plot what should I do?

  • @abhishekbhatia6092
    @abhishekbhatia6092 Před 5 lety +1

    While interpreting the residual plots, can I first pool the residuals in specific bins of X (say each bin 1 unit long or whatever) , so that it looks more like the previous plot with residuals for a given value of X, enabling me to verify the homoscedasticity (and also normality somewhat) more clearly?
    Edit: Q) You mentioned that one of the assumptions was that for a given value of X, the error terms are normally distributed with a constant variance sigma-squared (same for each X). Then at 5:50 you took all the residuals disregarding the value of X, and graphically checked it for normality using a Q-Q plot. Didn't you mention that the normality assumption was for errors for a given value of X? I am confused. pls help.

    • @n9537
      @n9537 Před 5 lety

      answer to edit: If we assume that sampling was completely random, then data from all treatments/groups/sub-populations/values of X were equally likely to be represented in your sample.In that case all the residuals can be clubbed together and checked for normality.It s same as checking for each treatment group.Note this applies only for the residuals, not the variables.

    • @n9537
      @n9537 Před 5 lety

      In regression we usually have predictor variables continuous. so it is impossible to check normality for each value of X. in case of ANOVA , the predictor is usually categorical and you can venture to check residual normality for each treatment group/category.Both ANOVA and Regression come under Generalized Linear Model(GLM), so the assumptions are the same but they play out differently.

    • @n9537
      @n9537 Před 5 lety

      Actually all assumptions are on the error terms.But since Residuals are an estimate on the error, we check for "good behavior" on the residuals. We have to make do with what we have(which is the residuals, the error is unknown)

    • @n9537
      @n9537 Před 5 lety

      this also follows from the assumption that error ~ i.i.d N(0,sigma^2). So all residuals(used in place of error as a good estimate) are identically distributed(same mean and variance) and are independent of X , implies you can't look at a set of residuals and figure out which value of X it came from. For all you know, they all could be from the same value of X or different values of X.Needless to say, they must be sourced from the same population, you can't club residuals from different populations/different predictor(s). So for checking normality of residuals, you can disregard value of X.This is not the case for Y (dependent variable).

  • @maydin34
    @maydin34 Před 7 lety

    Nice video.Thank you.
    But it is just plotted between random part vs independent variable(x). What if we have multiple independent varibles ( say z,t,w etc.). Do we need to check for all those seperately by expecting very same variance again regardless of independent variable? (random vs z, random vs t , etc.) Or is it ok just plotting predicted-y vs random part?

  • @GlorifiedTruth
    @GlorifiedTruth Před 6 lety

    So helpful! Thanks.

  • @KingQuetzal
    @KingQuetzal Před 3 lety

    So I got the 4:12 graph how can I find out what kind of data I have?

  • @Bombingp
    @Bombingp Před 6 lety

    Thanks! Helped a lot!

  • @ujasdiyora2804
    @ujasdiyora2804 Před 8 měsíci

    I have one doubt, here we are talking about simple linear regression in all of these videos in playlist. So this assumptions are also true for linear regression, multiple regression and polynomial regression ? , and all of these theory of finding confidence interval and hypothesis testing at the end to find whether coefficients are statistically significant or not , are these methods also applied in any other linear regression ?

    • @jbstatistics
      @jbstatistics  Před 8 měsíci

      The general idea still holds, yes. The specific formulas for the standard errors, degrees of freedom, etc., will change when there is more than one predictor. And there are many subtleties when it comes to multiple regression, so it's best to learn all about MLR rather than think something like "well, it's just like simple linear regression but with more predictors." That said, yes, the general ideas port over from simple linear regression to multiple linear regression in a natural way. Polynomial regression is a type of multiple regression, so same idea there.

  • @Kaa279
    @Kaa279 Před 8 lety

    in 4:40 you said that there is another feature that we didn't included in our model. but it can also conclude that my model is not good, right?

  • @omkareshpali8486
    @omkareshpali8486 Před 3 lety

    Hi I have a question, let's say I built a model and the R2 value came out 70%
    How do I make sure that is the maximum variance I can explain by looking at the residuals.

  • @jt007rai
    @jt007rai Před 4 lety

    at 4:21 , can we determine which model will solve this issue based on just looking at this residual plot?

  • @karrisgiani5137
    @karrisgiani5137 Před 7 lety

    Brill video! If residuals appear to show an inverted U, how can I improve the model?

  • @Kingshuk91
    @Kingshuk91 Před 10 lety

    Great video. How is plot of e vs time and plot of e(t) vs e(t-1) different?

  • @nightwalkers5579
    @nightwalkers5579 Před 11 lety

    think you guys should get more views... may be there are not enough stats students in the country

  • @purityrima1366
    @purityrima1366 Před 4 lety

    @jbstatistics, thank you so much for helping me understand these plots! You are the best teacher:) I give you a big check mark for this video too. awesome explanation!

  • @davidchau6874
    @davidchau6874 Před 11 lety

    how to check the independence in the residual plot?

  • @Doh333
    @Doh333 Před 8 lety

    Would it be relevant to make residual plots if i want to check a categorical variable in a lineare regression ?

    • @jbstatistics
      @jbstatistics  Před 8 lety

      Yes, some types of residual plots are still informative for categorical explanatory variables. With a categorical variable, a check for linearity is not required, but residual plots can still help to check the normality and common variance assumptions.

  • @simonschacht1810
    @simonschacht1810 Před 11 lety

    Thank you

  • @VivianGameCollections
    @VivianGameCollections Před 2 lety

    safe my day

  • @Patriciacx
    @Patriciacx Před 6 lety

    Thank you!

  • @MohitSingh-ub9gc
    @MohitSingh-ub9gc Před 6 lety

    but why are we doing this, please explain ?

  • @alexkay7199
    @alexkay7199 Před 5 lety

    Please help: Do the residuals have a unit or are they unitless???

    • @jbstatistics
      @jbstatistics  Před 5 lety

      The residuals are the differences between the observed values of Y and the predicted values of Y. The units of both the observed and predicted values of Y are just the units of Y, and thus the units of the residuals are the units of Y.

    • @alexkay7199
      @alexkay7199 Před 5 lety

      @@jbstatistics Thanks a lot! Really helpful!!

  • @lowerterror7993
    @lowerterror7993 Před 2 měsíci

    No one people like data analytics

  • @yousifsalam
    @yousifsalam Před rokem

    @4:40 why did you say the residuals are small then big then small..
    don't you mean they're negative, positive.. since their magnitude is the same?

    • @dariopl8664
      @dariopl8664 Před rokem

      I think it's because "ε" is a random variable (as he mentioned it in previous videos), and should stay so. If they appear at time sections up and then a bunch of them down, that randomness breaks up, since when a whole of them are up you can forsee they'll be down next time (then where's the randomness?).
      I think that if all are up the same amount they're down (as you see @4:40), then, would they still have a normal distribution? no, it would be just a straight line probability distribution, in which you know the moment you're up, next will be down, and so on.
      This model assumed ε follows a normal distribution, which is reasonable, since in real life many events occur this way.
      If they're are jumping up and down in clusters then we're not dealing any longer with this reasonable distribution. But of course, at the end he'd some way deal with this time effect he didn't know beforehand was causing this, maybe so as to normalize them, as they should be to fit the model🤔. I don't know yet how he tackles this problem. If I find about it I'll tell you.
      Hope this reply was helpful.
      Best regards.

  • @Nias0404
    @Nias0404 Před 6 lety

    I'm not sure how to get the Q-Q plot... can anyone explain?

    • @jbstatistics
      @jbstatistics  Před 6 lety

      It's almost always created using software. My intro to normal QQ plots is found here: czcams.com/video/X9_ISJ0YpGw/video.html

    • @Nias0404
      @Nias0404 Před 6 lety

      Thanks a lot!! Much appreciated

  • @SaranathenArun11E214
    @SaranathenArun11E214 Před 5 lety

    brilliant

  • @harrygroundwater2590
    @harrygroundwater2590 Před 2 měsíci

    Ver helpful

  • @ogedaykhan9909
    @ogedaykhan9909 Před 5 lety

    E X C E L L E N T
    Thanks a lot

  • @MrPreston1056
    @MrPreston1056 Před 10 lety

    Is there a way to t test the residual plot?

    • @jbstatistics
      @jbstatistics  Před 10 lety

      What kind of test are you hoping to do? There isn't going to be an overall increasing or decreasing trend in the residuals in simple linear regression. There may be curvature, and we could test to see whether adding higher order terms (e.g. X^2) results in a significantly improved fit. Cheers.

    • @MrPreston1056
      @MrPreston1056 Před 10 lety

      Can you use the t statistic to test if H: E(e "hat"sub i)=0 vs. H: E(e"hat" sub i) not equal to zero. E being mean and e "hat" being error

    • @jbstatistics
      @jbstatistics  Před 10 lety

      Preston C No, we can't test that. The (observed) residuals always sum to 0 in simple linear regression. When we say the expectation of epsilon is 0 (at every X), we are in effect saying that E(Y|X) falls on the line beta_0 + beta_1X. Conceptually, we could have a different model where the expectation of epsilon was assumed to be 2 instead of 0. This would change very little, except that beta_0 in this model would be 2 less than beta_0 in our usual model. This would unnecessarily complicate things, so we define epsilon to be a random variable with a mean of 0. Cheers.

    • @MrPreston1056
      @MrPreston1056 Před 10 lety

      But if we tested to see if e_i=0 vs not equal to 0 and rejected the null hypothesis that e_i=0 wouldn't that indicate that the residuals did not sum to zero and our previous assumptions were false?

    • @jbstatistics
      @jbstatistics  Před 10 lety

      Preston C The observed residuals sum to 0. That is not an assumption, it is a consequence of the least squares fit. If we attempted to test the null hypothesis that the true mean residual is 0 with a t test, we would end up with a test statistic of 0 and a p-value of 1. So that wouldn't really be a test. If you're wondering about testing the null hypothesis that E(epsilon) = 0 at *any given value of X*, that's a bit of a different story. We do something along those lines when we carry out a lack-of-fit test. (This tests the null hypothesis that the means do indeed fall on a line. We can do this sort of thing when we have multiple observations at at least some of the X's.)

  • @ukrainrussiawarvideos2810

    A GOOD GAIED Program

  • @unofficiallyofficial2149

    Probably a simple model for college students, not high school.

    • @jbstatistics
      @jbstatistics  Před 5 lety

      The "simple" in simple linear regression refers to there being only one predictor (one x), and not because it's simple or easy. It's just the well-established name of the model. Unlike many others, I don't use any clickbait words like "easy" or "simple".

    • @unofficiallyofficial2149
      @unofficiallyofficial2149 Před 5 lety

      @@jbstatistics Oh, I understand. Thanks.

  • @JoshuaDHarvey
    @JoshuaDHarvey Před 4 lety

    Nothing he is explaining makes any sense.