Video není dostupné.

Omlouváme se.

Simple Linear Regression: Checking Assumptions with Residual Plots

jbstatistics

zhlédnutí 324 732

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 4. 12. 2012
An investigation of the normality, constant variance, and linearity assumptions of the simple linear regression model through residual plots.
The pain-empathy data is estimated from a figure given in:
Singer et al. (2004). Empathy for pain involves the affective but not sensory components of pain. Science, 303:1157--1162.
The Janka hardness-density data is found in:
Hand, D.J., Daly, F. , Lunn, A.D., McConway, K., and Ostrowski, E., editors (1994). The Handbook of Small Data Sets. Chapman & Hall, London.
Original source: Williams, E.J. (1959). Regression Analysis. John Wiley & Sons, New York. Page 43, Table 3.7.

Komentáře • 148

@48956l Před 8 lety ⁺¹⁵²
I'M GIVIN THIS VIDEO THE BIG CHECK MARK
@jbstatistics Před 7 lety ⁺¹¹
Thanks!
@read89simo Před 7 lety ⁺⁵
ME TOO + A BIG SUBSCRIBE
@messididit Před 4 lety ⁺²
@@read89simo + A BIG LIKE BUTTON
@user-pl7zr2jm5h Před 5 měsíci ⁺⁴
this was posted 11 years ago T-T and has the best explanations and videos on statistics I have ever found, thank you so much for all your hard work and legacy, i hope you know you're my savior.
@jbstatistics Před 5 měsíci ⁺³
I'm glad to be of help! 11 years, where'd they go? :)
@jbstatistics Před 11 lety ⁺⁵⁰
"think you guys should get more views..."
Thanks! (And I'll take as a compliment that you said "you guys", since this is a one man show.) Getting lots of views isn't very high on my priority list -- I'm just trying to provide the best resources for my students that I can. (I haven't done any promotion, and I don't allow ads on the videos.)
There are many students in intro stats in North America and around the world, and I'm glad that some of them find my videos helpful.
@williamlee0 Před 4 lety ⁺¹
I'd upvote you x10 if I could just for the anti-advert policy.
@nkululekoshabane3373 Před 9 lety ⁺⁴⁰
One of the best, if not the best, video on regression analysis I've seen. Thank you very much for creating it. Your service is highly appreciated.
@jbstatistics Před 9 lety ⁺⁴
Nkululeko Shabane You are very welcome, and thank you very much for the compliment!
@GuppyPal Před 2 lety ⁺¹³
This is exactly what I have needed. My professor goes over these plots but has been doing statistics at a high level so long that I think it's hard for him to relate to someone who is new to it. I really needed someone to just explain it all from start to finish, and you did that. Thank you so much! Your videos are so, so helpful. Sincerely, a first year statistics graduate student.
@jbstatistics Před 2 lety ⁺¹
I'm glad to be of help!
@doodelay Před 5 lety
"The residual plot removes that increasing trend and then re-scales the y axis, so it's a little bit easier to see these issues.. sometimes in the residual plot." Now that is some serious insight. Thank you so much and this video was superb with really excellent examples!
@jbstatistics Před 5 lety ⁺¹
Thanks for the kind words!
@jbstatistics Před 11 lety ⁺²
You are very welcome Simon!
@snake1625b Před 8 lety
Excellent methods used to help students learn in this vid. This is the future of education!
@valeriereid2337 Před rokem ⁺¹
Thank you for this excellent lecture. It certainly helps.
@raseshgupta6276 Před 2 lety
I was struggling to understand the assumptions in simple linear regression through other sources. This video has made it clear
@jbstatistics Před 11 lety ⁺¹
I'm glad you find them useful John. Best of luck in your course!
@vasili111 Před 10 lety ⁺¹
Very good videos about simple linear regression. Thank you very much for creating them!
@jbstatistics Před 10 lety
You are very welcome!
@johncasey722 Před 11 lety ⁺¹
I'm so fricking glad these videos align well with my UIUC stats class. Much appreciated!
@Maha_s1999 Před 7 lety ⁺¹
Prof Balka knocks it out of the park every time! We miss your videos. Could you do some videos on multiple linear regression? Hope you come back soon with new vids!
@jbstatistics Před 7 lety
Thanks for the compliment! I'm trying to make time for video production, but probably won't get back to it until the new year. It's been a busy few years, but returning to the videos has always been part of the plan (with multiple regression videos up near the top of the list). Cheers.
@Maha_s1999 Před 7 lety
YAY!! Thanks Prof !! I will look out for them.
@deniskapliy2642 Před 7 lety
Small...and then they're big...and then they're small...and then they're big..
Great video, pretty simplistic, but very useful, thank you!
@shayd146 Před 10 lety ⁺²
JB thank you so much you have helped me more than you'll ever know! My only suggestion to you would be to create playlists for associated topics. Other than that your teaching methods are incredible! Thanks!
@jbstatistics Před 10 lety ⁺¹
Thanks very much for the compliment Shaydoyle! I believe I do have playlists ordered by topic. I've also set up a website (www.jbstatistics.com), which keeps the videos in a more organized fashion. (I'm not plugging anything on the site - it's just organized lists of my videos.) Cheers.
@jbstatistics Před 11 lety
We often simply rely on an appropriate sampling design or experimental design to ensure independence. But if, say, we have recorded the observations in some sort of time order, then plots of the residuals through time can give us some indication of whether the residuals are correlated.
@dedraryqui5606 Před 8 lety ⁺¹
very clear, easy understandable video
@muhammadusama1558 Před 4 lety
The more I watch your video, the more I hate my uni. Much love man
@hritwick1221 Před 3 lety
you are great man . thanks for your content . I am forever great full to you .
@rodrigopaolinelli6448 Před 2 lety
This is a definitely a great video, thank you! You are awesome!
@carnationize Před 5 lety
Thanks a lot! All your videos on stats are very clear and have been very helpful!
@jbstatistics Před 5 lety
You are very welcome!
@hichamitani6433 Před 2 lety
Thank you
Need more like these videos on outliers in residuals
@Jelly-cy4vh Před rokem
This was very useful, thank you for all the information
@MohamedAbdo-xs7bf Před 5 lety
You are Awesome! Thank you so much for sharing your valuable knowledge.
@rahkshi96 Před 8 lety
Thank you very much jb statistics. This is incredibly helpful and well explained.
@jbstatistics Před 8 lety
+Peter Song You are very welcome. Thanks for the compliment!
@bharathganeshkumar7071 Před 5 lety
Thanks for their video.. Short and sweet...!!!
@jbstatistics Před 5 lety
You are very welcome!
@mostafaali8684 Před 8 lety ⁺¹
Good video, thank you very much for uploading it.
@jbstatistics Před 8 lety
+Mostafa Ali You are very welcome. I'm glad you found it useful!
@jamiebond8481 Před 7 lety
good and simple explanation of residual plots and assumptions.
@jbstatistics Před 7 lety
Thanks!
@bibekanandasahoo3497 Před 2 lety
thanks for this great explanation sir .....
@Pavankumar-zw2fz Před 4 lety
Very good Explanation Sir.Thank You
@linneajohansson3796 Před 3 lety
This was very helpful! Thank you!
@TB3hnz Před 4 lety ⁺²
4:12 "I'm giving this the joker variance, because *let's put a SMILE on that FACE!* "
@angelinelam5862 Před 4 lety
Thank you for this useful video !
@syedahmedali7417 Před 4 lety
you are such a great teacher...
@jbstatistics Před 4 lety ⁺¹
Thanks!
@ananyapamde4514 Před 3 lety
Great video!
@savageprincess2796 Před rokem
im giving this video A BIG CHECK MARK (2)
@jingwen8133 Před 4 lety
Very useful video ! Thank you
@Jemimakl Před 5 lety ⁺²
So helpful! Thank you for this :)
@selinechung1692 Před 5 lety
LMAO BRO WHY ARE YOU HERE
@Jemimakl Před 5 lety
Seline Chung WHY ARE YOU HERE
@selinechung1692 Před 5 lety
@@Jemimakl WHY ARE YOU SO HARDWORKING
@selinechung1692 Před 5 lety
@@Jemimakl BRO YOU STARTED A WEEK AGO
@Jemimakl Před 5 lety
Seline Chung I WAS DOING HOMEWORK
@aayushiagarwal6188 Před 4 měsíci
Perfectly explained ✨️
Could you please let me know, if the white centre line (the one around which all the ebsilon points are there) is itself not straight and showing a pattern ,what do we interpret? Does this mean that the mean of errors is non zero and hence our assumption is contradicted?
@wenlidi1604 Před 8 lety ⁺¹
very clear explanation.
@JoaoVitorBRgomes Před 4 lety ⁺¹
At 1:56 you can't plot against Y because there is dependence between Y and the residuals? You mean the residuals are the difference between the observed and the estimated, so makes no sense to plot against the observed? But why? Could you clarify this?
@dylanburns9381 Před 7 lety ⁺¹
great video. such a clear explanation. subbed.
@willtube9 Před rokem
Prof, Could you do some videos on multiple linear regression? Hope you come back soon with new vids!
@sivanschwartz3813 Před 8 lety ⁺¹
thank you for this amazing video!!!!!!
@jbstatistics Před 8 lety ⁺¹
You're very welcome!
@hanaizdihar4368 Před 3 lety
this really helps, thank you
@yingdili2219 Před 4 lety
perfect video
@bhabeshmahanta3408 Před 5 lety
Very nice teaching. Thanks
@Riley8185 Před 6 lety
These are very good videos
@frederickrosas5248 Před 2 lety ⁺¹
Hi Sir. May I know what statistical tests/treatments being used in residuals plots to confirm what is allowed and not? Thank you for your help.
@DHDH_DH Před 6 měsíci
Still extremely helpful in 2024
@CHIRAGPERLA Před 5 lety
This is gold!
@Stephanbitterwolf Před 6 lety
Great video! Thank you!
@pubgvulcanizer7857 Před 3 lety
Very nicely explained 👍
@frederikhe707 Před 7 lety
Nice! The only improvement I would suggest is that you actually name the violated assumptions. I mean people can draw that conclusion on their own but that would make it even more clear.
@davidli6068 Před 4 lety
thanks a lot your a king
@aabinamasoodgundroo5971 Před 2 lety ⁺¹
my graph is blank, what does that mean?
@infoesenn Před 3 lety
Question: Why do you assume normally distributed errors? From my understanding, in large samples iid-errors with from any distribution should be sufficient (Central Limit Theorem).
@pate1495 Před 3 lety
I have a question regarding the Normal Q-Q plot. On the y-axes, does it show the quantiles of the residual distribution, or the residuals itself? On the x-axes it shows the quantiles of the residual distribution if it were normal, correct? Thank you, great video!
@jbstatistics Před 3 lety
There are different ways of formatting these plots, but here I have the ordinary residuals on the y axis. (The y axis value for any point is the ordinary residual of that point.) Any value could be considered a quantile. The x axis represents the corresponding quantile from the standard normal distribution. So if the residuals were normally distributed, we'd expected those values to fall (roughly) in a linear pattern. (There are some technical issues here, as the observed residuals aren't technically iid normal, even if the OLS assumptions are true, but it's a rough approximation.)
@purityrima1366 Před 4 lety
Please can you share me a link with your video on how to correct the unequal change in variance problem shown on the plots. Thanks in advance
@sanjaypandey6586 Před 2 lety
is it ok in linear regression if dependent and independent variable are not normally distributed if not what should be the optimum solution for negative skew and neg kurtosis
@renshiue Před 2 lety
nice and clear
@siryohannb3626 Před 3 lety
thankyou very much
@JoaoVitorBRgomes Před 4 lety ⁺¹
3:23 what kind of graph indicates non normality?
@carlosaugusto212 Před 4 lety
Shouldn't we analyse the standardized residual plot? I mean, the residuals will be naturally bigger as the y value gets bigger, won't it? If the y range goes from 0,1 to 10 thousand, we expect bigger residual absolute values near the 10 thousand mark. Correct me if I'm wrong, please
@ohhrelingo6271 Před 2 lety
If I can't find out if the variance is constant from the plot what should I do?
@abhishekbhatia6092 Před 5 lety ⁺¹
While interpreting the residual plots, can I first pool the residuals in specific bins of X (say each bin 1 unit long or whatever) , so that it looks more like the previous plot with residuals for a given value of X, enabling me to verify the homoscedasticity (and also normality somewhat) more clearly?
Edit: Q) You mentioned that one of the assumptions was that for a given value of X, the error terms are normally distributed with a constant variance sigma-squared (same for each X). Then at 5:50 you took all the residuals disregarding the value of X, and graphically checked it for normality using a Q-Q plot. Didn't you mention that the normality assumption was for errors for a given value of X? I am confused. pls help.
@n9537 Před 5 lety
answer to edit: If we assume that sampling was completely random, then data from all treatments/groups/sub-populations/values of X were equally likely to be represented in your sample.In that case all the residuals can be clubbed together and checked for normality.It s same as checking for each treatment group.Note this applies only for the residuals, not the variables.
@n9537 Před 5 lety
In regression we usually have predictor variables continuous. so it is impossible to check normality for each value of X. in case of ANOVA , the predictor is usually categorical and you can venture to check residual normality for each treatment group/category.Both ANOVA and Regression come under Generalized Linear Model(GLM), so the assumptions are the same but they play out differently.
@n9537 Před 5 lety
Actually all assumptions are on the error terms.But since Residuals are an estimate on the error, we check for "good behavior" on the residuals. We have to make do with what we have(which is the residuals, the error is unknown)
@n9537 Před 5 lety
this also follows from the assumption that error ~ i.i.d N(0,sigma^2). So all residuals(used in place of error as a good estimate) are identically distributed(same mean and variance) and are independent of X , implies you can't look at a set of residuals and figure out which value of X it came from. For all you know, they all could be from the same value of X or different values of X.Needless to say, they must be sourced from the same population, you can't club residuals from different populations/different predictor(s). So for checking normality of residuals, you can disregard value of X.This is not the case for Y (dependent variable).
@maydin34 Před 7 lety
Nice video.Thank you.
But it is just plotted between random part vs independent variable(x). What if we have multiple independent varibles ( say z,t,w etc.). Do we need to check for all those seperately by expecting very same variance again regardless of independent variable? (random vs z, random vs t , etc.) Or is it ok just plotting predicted-y vs random part?
@GlorifiedTruth Před 6 lety
So helpful! Thanks.
@KingQuetzal Před 3 lety
So I got the 4:12 graph how can I find out what kind of data I have?
@Bombingp Před 6 lety
Thanks! Helped a lot!
@jbstatistics Před 6 lety
You are welcome!
@ujasdiyora2804 Před 8 měsíci
I have one doubt, here we are talking about simple linear regression in all of these videos in playlist. So this assumptions are also true for linear regression, multiple regression and polynomial regression ? , and all of these theory of finding confidence interval and hypothesis testing at the end to find whether coefficients are statistically significant or not , are these methods also applied in any other linear regression ?
@jbstatistics Před 8 měsíci
The general idea still holds, yes. The specific formulas for the standard errors, degrees of freedom, etc., will change when there is more than one predictor. And there are many subtleties when it comes to multiple regression, so it's best to learn all about MLR rather than think something like "well, it's just like simple linear regression but with more predictors." That said, yes, the general ideas port over from simple linear regression to multiple linear regression in a natural way. Polynomial regression is a type of multiple regression, so same idea there.
@Kaa279 Před 8 lety
in 4:40 you said that there is another feature that we didn't included in our model. but it can also conclude that my model is not good, right?
@omkareshpali8486 Před 3 lety
Hi I have a question, let's say I built a model and the R2 value came out 70%
How do I make sure that is the maximum variance I can explain by looking at the residuals.
@jt007rai Před 4 lety
at 4:21 , can we determine which model will solve this issue based on just looking at this residual plot?
@karrisgiani5137 Před 7 lety
Brill video! If residuals appear to show an inverted U, how can I improve the model?
@Kingshuk91 Před 10 lety
Great video. How is plot of e vs time and plot of e(t) vs e(t-1) different?
@nightwalkers5579 Před 11 lety
think you guys should get more views... may be there are not enough stats students in the country
@purityrima1366 Před 4 lety
@jbstatistics, thank you so much for helping me understand these plots! You are the best teacher:) I give you a big check mark for this video too. awesome explanation!
@davidchau6874 Před 11 lety
how to check the independence in the residual plot?
@Doh333 Před 8 lety
Would it be relevant to make residual plots if i want to check a categorical variable in a lineare regression ?
@jbstatistics Před 8 lety
Yes, some types of residual plots are still informative for categorical explanatory variables. With a categorical variable, a check for linearity is not required, but residual plots can still help to check the normality and common variance assumptions.
@simonschacht1810 Před 11 lety
Thank you
@VivianGameCollections Před 2 lety
safe my day
@Patriciacx Před 6 lety
Thank you!
@jbstatistics Před 6 lety
You are very welcome!
@sarita-ey5cw Před 5 lety
+jbstatistics ऊघजेऐऊ
@MohitSingh-ub9gc Před 6 lety
but why are we doing this, please explain ?
@alexkay7199 Před 5 lety
Please help: Do the residuals have a unit or are they unitless???
@jbstatistics Před 5 lety
The residuals are the differences between the observed values of Y and the predicted values of Y. The units of both the observed and predicted values of Y are just the units of Y, and thus the units of the residuals are the units of Y.
@alexkay7199 Před 5 lety
@@jbstatistics Thanks a lot! Really helpful!!
@lowerterror7993 Před 2 měsíci
No one people like data analytics
@yousifsalam Před rokem
@4:40 why did you say the residuals are small then big then small..
don't you mean they're negative, positive.. since their magnitude is the same?
@dariopl8664 Před rokem
I think it's because "ε" is a random variable (as he mentioned it in previous videos), and should stay so. If they appear at time sections up and then a bunch of them down, that randomness breaks up, since when a whole of them are up you can forsee they'll be down next time (then where's the randomness?).
I think that if all are up the same amount they're down (as you see @4:40), then, would they still have a normal distribution? no, it would be just a straight line probability distribution, in which you know the moment you're up, next will be down, and so on.
This model assumed ε follows a normal distribution, which is reasonable, since in real life many events occur this way.
If they're are jumping up and down in clusters then we're not dealing any longer with this reasonable distribution. But of course, at the end he'd some way deal with this time effect he didn't know beforehand was causing this, maybe so as to normalize them, as they should be to fit the model🤔. I don't know yet how he tackles this problem. If I find about it I'll tell you.
Hope this reply was helpful.
Best regards.
@Nias0404 Před 6 lety
I'm not sure how to get the Q-Q plot... can anyone explain?
@jbstatistics Před 6 lety
It's almost always created using software. My intro to normal QQ plots is found here: czcams.com/video/X9_ISJ0YpGw/video.html
@Nias0404 Před 6 lety
Thanks a lot!! Much appreciated
@SaranathenArun11E214 Před 5 lety
brilliant
@jbstatistics Před 5 lety
Thanks!
@harrygroundwater2590 Před 2 měsíci
Ver helpful
@ogedaykhan9909 Před 5 lety
E X C E L L E N T
Thanks a lot
@MrPreston1056 Před 10 lety
Is there a way to t test the residual plot?
@jbstatistics Před 10 lety
What kind of test are you hoping to do? There isn't going to be an overall increasing or decreasing trend in the residuals in simple linear regression. There may be curvature, and we could test to see whether adding higher order terms (e.g. X^2) results in a significantly improved fit. Cheers.
@MrPreston1056 Před 10 lety
Can you use the t statistic to test if H: E(e "hat"sub i)=0 vs. H: E(e"hat" sub i) not equal to zero. E being mean and e "hat" being error
@jbstatistics Před 10 lety
Preston C No, we can't test that. The (observed) residuals always sum to 0 in simple linear regression. When we say the expectation of epsilon is 0 (at every X), we are in effect saying that E(Y|X) falls on the line beta_0 + beta_1X. Conceptually, we could have a different model where the expectation of epsilon was assumed to be 2 instead of 0. This would change very little, except that beta_0 in this model would be 2 less than beta_0 in our usual model. This would unnecessarily complicate things, so we define epsilon to be a random variable with a mean of 0. Cheers.
@MrPreston1056 Před 10 lety
But if we tested to see if e_i=0 vs not equal to 0 and rejected the null hypothesis that e_i=0 wouldn't that indicate that the residuals did not sum to zero and our previous assumptions were false?
@jbstatistics Před 10 lety
Preston C The observed residuals sum to 0. That is not an assumption, it is a consequence of the least squares fit. If we attempted to test the null hypothesis that the true mean residual is 0 with a t test, we would end up with a test statistic of 0 and a p-value of 1. So that wouldn't really be a test. If you're wondering about testing the null hypothesis that E(epsilon) = 0 at *any given value of X*, that's a bit of a different story. We do something along those lines when we carry out a lack-of-fit test. (This tests the null hypothesis that the means do indeed fall on a line. We can do this sort of thing when we have multiple observations at at least some of the X's.)
@ukrainrussiawarvideos2810 Před 8 lety
A GOOD GAIED Program
@unofficiallyofficial2149 Před 5 lety
Probably a simple model for college students, not high school.
@jbstatistics Před 5 lety
The "simple" in simple linear regression refers to there being only one predictor (one x), and not because it's simple or easy. It's just the well-established name of the model. Unlike many others, I don't use any clickbait words like "easy" or "simple".
@unofficiallyofficial2149 Před 5 lety
@@jbstatistics Oh, I understand. Thanks.
@JoshuaDHarvey Před 4 lety
Nothing he is explaining makes any sense.

Další v pořadí

Automatické přehrávání