Why Dividing By N Underestimates the Variance
Vložit
- čas přidán 14. 07. 2019
- This is the follow up video to:
Statistics Fundamentals: The Mean, Variance and Standard Deviation
• Calculating the Mean, ...
In it, we show exactly why, when we estimate the variance, dividing by 'n' underestimates the value we are interested in. It also describes why we square each term instead of taking the absolute value. The visuals used in this StatQuest make it easy to remember why we should divide by n-1, and this will save us from falling into a very common pitfall.
If you'd like to support StatQuest, please consider...
Support StatQuest by buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
CZcams Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
Corrections:
3:23 I should have said "To understand why dividing by n underestimates the variation around the population mean".
3:40 The estimated mean was switched with the population mean.
#statquest #variance
Corrections:
3:23 I should have said "To understand why dividing by n underestimates the variation around the population mean".
3:40 The estimated mean was switched with the population mean.
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
BAM BUM hahah
Please Give Video on degrees of freedom please🙇
At around 8:35, you should've used asterisk '*' character instead of 'x' character for multiplication. I was a bit confused and thought you wrote 2*(x-v)*x-1 instead of 2*(x-v)*(-1). Great video by the way!
@@m3c4nyku43 noted
is there some sort of award we can give this guy? please?!
:)
I think we're encouraged to purchase a double dam t-shirt or sweatshirt, which is more of a financial incentive than an award but who doesn't like getting paid to be awesome? I'll probably pick one up this weekend
@@jacobmoore8734 Thanks! :)
@@jacobmoore8734 award + t-shirt = double bam! just ordered my own shirt. gonna wear it to my statistics test in 2 weeks
Really, your way is too unique. one of the best
Amazing Josh!!
I can't imagine how much hard work goes into simplifying the complex statistics concepts and coming up with these amazing videos. And on top of that your ingenious ideas of adding humor and musical creativity, taking the content to another level.
If there was an Oscar for tutoring you'd be the undisputed winner.
BAMMM !!- simply the best educator on CZcams....
BAM! :)
I searched for this on a number of online resources, some mentioned "n" while others "n-1", leaving me confused. This is the best possible explanation to the problem you made it really easy for us to understand. Thanks a lot !!! Bammmm subscribed and shared with friends.
Awesome!!! Thank you very much for subscribing and sharing my videos with your friends. :)
Josh, This is a total hypnotism you did with BAMs, echos, and other sounds. You mastered the art of making us stick there. I been searching for statistics and machine learning videos where they have kind of a roadmap, and simple explanations for complex topics, and this is it. You saved my life for sure, my donation is on its way, I know anything is small for what goes into making these. Hats off to you, you are a LEGEND. We owe you...
Wow, thank you!
Came to this video for "Why Dividing By N Underestimates the Variance" but got to know why absolute values are not used in Variance calculation. Literally cried, Prof. Josh.. Kudos to you. You are supporting me to understand the topics in statistics. I will support you regularly after I get a job soon. And I'm sure your teachings are required for many of the upcoming students in the coming decades. In India we have a concept called "Guru Kulam", and I see you as my guru (Not the term commonly known in the western world, this is more about respect)
Thank you so much!!! It means a lot to me.
Michael Scott: Why don't you explain this to me like I'm five?
Josh Starmer: Bammm!!
and understood ...
thank you : ) !
BAM! :)
@@statquest
Can you provide the slides for all the statistics videos you used to explain the concepts
@@Deepak-uv8du I have PDF study guides for some of my videos here: statquest.org/studyguides/
I've been 2 years asking how to plot variance, why sample variance (also sd) divided by n-1. And this is best explanation i ever had
Awesome! :)
I rather get a clear and understanding explanation with "BAMS" like i'm five, than a 50 pages long explanation with words like "trivial" and abbreviations (q.e.d) and just feel depressed and left clueless. And an other very important thing: Only if you *really* understood the topic, you can explain it with easy words. Very well done, Josh! Thank you very much!
Thank you very much!!!! :)
Kids today are so lucky they can review their stats online like this with great teachers.
:)
I have nothing but admiration; this is the clearest explanation that I've seen so far that does not shy away from the underlying math, yet still keeping it understandable for those with minimal math background. I feel like a bit of a fool when I see the contrast between my own attempts to explain this correction factor and your explanation.
I'm glad you like the video so much. Thanks! :)
Yet another video from this channel that leaves me speechless. I've never really understood this concept until I've watched your video. Thank you very much, again.
Wow, thank you!
Yes! I had already feared that the n-? question won't be explained. Glad to hear that you will explain this unsolved mystery in the next video!
Unfortunately, it will be a while before I get to it. I've got covariance and correlation coming up next, then a few machine learning videos, but then I'll loop back to expected values. It's a topic that I've wanted to work on for quite some time.
My uneducated guess about n-x is that the bigger the magnitude diff. between the population and your sample size, the larger x would be. Because as this magnitude get smaller and smaller, the need for x to have any significant value, disappears. My biggest question is why would x lead to this unitary value when your sample size is little. But we'll see what Josh explains about that.
Intuitively: the number you're dividing stands for the degrees of freedom you have. In other words: how many data points are allowed to vary freely. The reason that this is 1 less here is, as the video hinted at, because of the sample mean. If someone shared with you n-1 data points of their sample distribution of n points, and you know what the sample mean is, then you can easily calculate what the last data point is. I.e. that last data point doesn't have any freedom to vary, just because it was crucial in defining the sample mean. This doesn't matter if you know what the population mean is, precisely because the sample distribution didn't decide its value. Therefore all n values in a sample distribution with known population mean can be used to make an unbiased estimator, while only n-1 degrees of freedom can be used to have an unbiased estimator when all you known is the sample mean.
Mathematically: en.m.wikipedia.org/wiki/Bias_of_an_estimator
The first example (in the examples tab) shows why it should be n-1, and not n, or n-whatever.
Bam!!! I've watched lots of your videos after I discovered the one explaining the standard error. You make me understand stats concepts more clearly. Please continue making these awesome videos (machine learning too)!
5 dollars donated!
Thank you very, very much. I really appreciate it. :)
I click the like button before I watch it because I'm always sure I'll love it! Thanks so much for making this series. You'll never know how helpful it has been in my life
Hooray!!! Thank you very much! :)
the best accessible explanation I can find in the whole internet for this mystery. then just as I was about to say "aha! you missed out something!" towards the end of the video, you seemed to have read my mind and "p.s. if you are wondering why n-1 and not 0.5 or 2 .... " you are so so spot-on!
Thank you very much! :)
@@statquest I agree - best explanation i have found and i'm sharing this video with all my students. THANK YOU! So.... any chance that next video is coming out soon? (or has come out already?)
@@naysannaderi5135 I hope the next video will come out soon. Possibly in the next 4 months or so. I hope!
Thank you for this!! The first time I saw the formula for the sample variance I wondered why the n-1 was there, this is a great explanation.
Thanks!
hello, keej
i hate u mate
"Future is nooow, BAM " - #LOL #respect #welldone #thanks
Thank you! :)
came from calculating the mean, variance and SD video. Did not expect a proof for why variance = x-bar. This is a really good in depth video i've ever watched for statistics. Thank you very much.
bam!
This is excellent, I am looking forward to the next one.
Awesome and the best video with most simplified explaination.
Thank you! :)
Thank you for this explanation. When I was learning stat in university, I did not understand well, why we divide by (n-1) instead of n to estimate sample variance. You explained it so clearly in a way that I will never forget what I learnt. Thank you Josh !!!
Hooray! :)
This is epic, never got a better or clearer explanation for this particular problem. Hats off!🙌
Thanks a ton!
This is the best explanation that I've come across for this. And I really liked that you gave a proof for general set of observations. Thanks a lot.
Awesome, thank you!
Thank you, 10 years of confusion made clear by this 15 mins of video.
Hooray! I'm glad the video was helpful. :)
Great explanation!!
I'm loving every second of your videos!!! Cheers!!
Thank you! :)
I'm eagerly awaiting the expected values quest! Thank you so much for making these videos, I love watching them before sleep.
Awesome! It's on the to-do list, but it might not be done for awhile. :(
@@statquest That's cool, take your time to keep making awesome videos. I still have loads of your videos on my to-watch list!
I admire this explanation... Amazing. I really look forward to the expected values video!
Thank you. I started working on the expected value video, but it will still be awhile before I finish since I have many other projects to work on.
This channel is just incredible, well done!
Thank you very much! :)
Big thanks from Taiwan. I have been asking why not dividing by n since high school...but all I get from my teacher was only "a rule of thumb". Now I know the reason behind and thanks to statquest. BAM!!
Hooray! :)
I have always hated statistics but I just today found this channel and this guy explains everything elegantly! ❤😊
Wow, thank you!
Thank you so much for the clear and simple explanation. This is an example for when showing the proof is better than only trying to give an intuition.
Thanks
BAM! I have not seen this concept explained better anywhere else ever. Have you gotten around to making the follow-up video on 'expected values' ? Can't thank you enough for your channel
I've got the video on expected values czcams.com/video/KLs_7b7SKi4/video.html and czcams.com/video/OSPr6G6Ka-U/video.html , but there are still a few steps to go after that... :(
Amazing explanation of why we use the square of the errors instead of the absolute value! I always asked myself that and all the teachers said it was just to give a bigger weight to the errors! We need the statquest on expected value!
Thanks! I'm working on the expected value, but it still might be a few months before it's ready.
I think for even n there wouldn't even be a minimum point, rather a flat line between the 2 middle samples
This video is such a gem! Thanks for explaining the root of this concept which is not easy to find even in statistics books.
Glad it was helpful!
I love the way you explain these topics, great work!
Thanks!
I don't even remember what I was confused about in particular, but I remember feeling very happy to see this video. Will revisit this in the following days. Psst, you're a gem ;)
Thank you very much! :)
I can only have love for these videos, thank you Josh and all the team if you have any.
Thank you! It's just me doing all this.
8:22 `the way he said "Whaat" is so cute.. I'm in love
:)
Statquest, JBstatistics and Khan Academy.....You guys are just amazing !!.....Thank you for all you have done for us
Thank you! :)
Thank you, I haven't known about these channels
The last point about absolute value explains a lot! I was always wondering why squaring data is so much more common than taking absolute values!
bam! :)
I have been SO stressed out about a project I'm working on, and 3:15 made me laugh so hard!!! I didn't even realize how stressed out I was until I caught myself laughing for the first time in weeks. Thank you Josh!!! **sob**
Hooray!!! Good luck with your project. I hope it goes well. :)
THX!!! Looking forward for the STATQUEST on expected Values ;))))
Me too!
Super interesting! Thanks for your work!
Thanks!
Aah! Finally end. What a excellent work by you!! Statquest rocks ❤.. Thank you sir. You helped a lot in my carrier ❤.
Thanks!
Thanks for explanation!
I understand that differences between the SAMPLE data and the sample mean are smaller than the differences between the SAMPLE data and the population mean. BUT! We are not interested in the difference between the SAMPLE data and the population mean, rather we are looking for the difference between the TRUE POPULATION data and the population mean (the population variance). And it's not clear why this value would be larger.
I mean sample data is centered around sample mean the same way population data is centered around population mean. Comparing sample data with population mean feels to be misleading.
The best estimate we can do is the estimate of the variance around the sample mean, which is probably an underestimate, but not always. So this is the best we can do.
Your work is impeccable. BAM!
Thank you!
Thanks so much for the explanation, super clear as always
Glad it was helpful!
This is awesome explanation. Waiting for quest on 'Expected Values'....BAM!
Me too. Hopefully I can get to it soon.
Greatest explanation so far!
Thank you! :)
Oh no... I'm falling deeper and deeper into this rabbit hole
:)
You are a very great teacher, i like your coaching style, keep going on!
Thank you! 😃
Excellent. Thank you for a great explanation.
Glad you enjoyed it!
Mind = Blown.
Thankyou from Indonesia.
Thanks!
this is literally what I was trying to get a clear understanding on in the last few days? what are the chances? no seriously what are the chances?
That's awesome! :)
The most impressive explanation I've ever seen.
Thanks!
Thank you St Josh for this illuminating explanation :)
My pleasure!
Very nice explanation, god bless you josh!
Thank you! :)
Good job Josh!! Waiting for StatQuest on Expected Values! I am the one wondering why not dividing by 'n-0.5' or 'n-2'
Thanks!
im currently learning data analytics and trying to figure out ab testing and bam! here i am! thank you so much for making statistics fun and easy to understand! double bam!
Happy to help!
You left in a cliff hanger of expected values :((
Love your videos tho, thanks for these!
I'm working on it, but everything I do takes longer than I would like. :)
Perfect !!! thanks for posting.
Thanks!
I finally know why n-1 is used. Thank you so much!
Bam!
thank you and this is very helpful!!
I'm so glad!
Brilliantly explained....
Thank you very much! :)
ASTOUNDING EFFECTS & EXPLAINATIONS!
SUBSCRIBED TRIPLE BAM!!!
bam!
I usually hit like after the first BAMMM. This is some super great stuff Josh.
Thank you very much! :)
Awesome and Thank you!
No problem!
Your videos are so helpful
Thanks! :)
Great as always! Thanks Prof.
Thanks! :)
@@statquest I know that you have some videos on quantiles on your channel. I was wondering if you could think about the possibility for a video on quantile regression and its implication possiblities in genetics, especially in population genetics. I have found some resources on youtube and through online which mainly focuses on social sciences. Anayways, thank you again.
Thanks a lot!
You are so awesome ! Thank you Josh :)
Thank you! :)
This the best explanation ever
Thank you!
baaammm! subscribed.
Awesome! :)
BAAM! me too
Hey josh, yet another cool explanation.
Thank you! :)
I wish you were my stats teacher!! Amazing job!!!
Thank you! :)
@@statquest really waiting for the expected value video to get explanation of n-1. When can we expect it?
@@shouryanand456 Unfortunately, it might be a while. I've got a full plate until after the summer.
Man, I always thought that statistics doesn’t make any sense at all and that people should just blindly chug into weird formulas without questioning, but this was absolutely mind opening. Not even khan academy could explain the proof!
Thanks!
Man, you are great. From where did you learn these concepts? Keep making videos and enlighten us. Thank you.
Thanks! :)
@@statquest When I try to learn these concepts they seem complicated to me. From where did you learn these concepts?
@@shubhamtalks9718 The concepts seem complicated because people that do not really understand them try to teach them.
How did I learn them? Years of really hard work. I read everything I can about a subject, then I re-read it. Then I re-read it again. Then I make a program based on my ideas and see what happens. Then I re-read everything over again. And sooner or later I figure it out. But it takes a lot of time and a lot of work. Sometimes I worry I will not succeed, and sometimes I fail, but I keep trying anyway.
@@statquest Thanks😁
wow that n-1 has something to do with E(X)? Im waiting for it!
"The future is now" I'm dying
BAM! :)
This is Amazing! BAM!!
Thanks!
I really want to understand why we use n-1 instead of substituting any other number instead of 1. I'm guessing it has something to do with the way we approximate the mean and the variance. I think it's related to properties the normal distribution has and such. I think that to truly understand that analytically I'd have to integrate over all possible outcomes while taking into account all the probabilities and then calculating the average. It really excites me, but I don't know where I can find the information needed to understand the subject in more depth. Can you give me some advice on what textbooks I should read, please? I'd really really appreciate that!
See: online.stat.psu.edu/stat415/book/export/html/886
@@statquest thank you, I will definitely read that!
awesome video, do you have any recommended channel to learn derivative and calculus ?
I have a video on The Chain Rule Here: czcams.com/video/wl1myxrtQHQ/video.html but I have heard that Khan Academy is good for learning derivatives.
I just read wiki and found that even divided by n-1, we still underestimate the standard deviation (although we don't underestimate the variance anymore). I feel that's somewhat mind-blowing, since calculating sample std is such an ordinary job for statisticians, and it is surprisingly BIASED (and I am sure the standard error formula is also biased)...
interesting
@@statquest Yeah. This is the wiki page.
en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation
I wanted to know why we deduct exactly 1, but I guess that only takes 20 aditional minutes to explain. Hooraay!
Thanks for the videos :)
It's true. We have to dive into expected values and that is a whole new topic.
8:21 -> I will watch a thousand times and I will laugh out loud a thousand times 😂
Hooray! :)
Great video!
Where is the one about Expected Values?
I cannot wait with such a cliffhanger! GoT finale can wait...
Very funny! Yes, I have my work to do. I hope to get to expected values before too long.
@@statquest Thanks a lot for responding! ... and sorry, as I noticed after reading more comments, that you had already answered this question many times. Quest on!
Thanks!
Wow!!! Thank you so much for supporting StatQuest!!! :)
Clarity brings understanding
Bam! :)
Waiting for StatQuest on Expected Values.
I love your video's...
DOUBLE BAM !
Thank you! :)
Great video! Which book did you get this explanation from?
Ummm....I just did the math.
Great video! Can you share link of StatQuest on Expected Values that explains why divide by n-1 and not n-0.5 or not n-2? Thanks!
I hope to do that one day.
This is gold.
Thank you!
BAM! Thank you !
:)
Thank you! These videos have really helped me in understanding the concepts clearly. However, I wanted to know if you have uploaded the video on Expected Values and if you could provide the link to the same.
I haven't uploaded the expected value StatQuest yet, but I am working on it.
@@statquest Alright. Thank you
mind blown!
BAM!
Amazing video!
I think that you should teach another subject. Maybe MathQuest? That would be amazing!
Maybe one day!
I love the humor included in this videos, hahahah!
Thanks!
Waited the whole video to know why it was n-1 and not n-2 etc... "that mistery will be resolved in the next episode"
Felt like watching an overstretched TV series in some way haha.
I understand it's capital to show first that sample variance underestimate true variance, but could mention earlier that you ll not focus on "why" it is n-1. :p
Thank you though wonderful content!
I tried to be careful with the title of this video with "Why dividing by N underestimates the variance" instead of "Why n-1 gives us an unbiased estimate".
That being said, I really wanted to explain exactly why n-1 works, but the proof is relatively advanced.