Variance: Why n-1? Intuitive explanation of concept and proof (Bessel‘s correction)

statsandscience

zhlédnutí 15 033

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 23. 08. 2024

Komentáře • 71

@J-sh4gf Před 6 měsíci
“The expected difference between the “correct” formula of the variance and the “wrong” one with n and the sample mean, is equal to the variance of the sample mean!”
This one sentence untied a knot! Thank you very much. This video is by far, and I watched various explanation of degrees of freedom etc., the best I've seen.
@cannot-handle-handles Před 2 lety ⁺⁵
The explanation given at 24:55 why we divide by 2 ("each value is in here twice") does not seem intuitive: If we only added (x_i-x_j)^2 for i
@statsandscience Před 2 lety
Not sure if I understand you correctly, but your suggestion is basically to only look at one of the triangles of the matrix without the diagonal. Is that correct? In that case you would have to divide by n(n-1) only and not by 2 to get the same value.
@cannot-handle-handles Před 2 lety ⁺¹
@@statsandscience I'll try and elaborate more:
The sum of all the squares is 812. Divided by n^2 (because that's the number of squares we're considering), that's 16.571… Divided by 2, that's 8.28571… And finally, with Bessel's correction, it's 9.666…
The sum of half of the squares is 406. Divided by n(n-1)/2 (because that's the number of squares we're considering), that's 19.333… So we still have to divide by 2 to get 9.666…
I'm not saying the formula is wrong, just the explanation ("each value is in here twice"). In both cases, you have to divide by the number of squares AND by 2. So the division by 2 is not explained by counting the squares twice.
@cannot-handle-handles Před 2 lety
@statsandscience But the number of squares in one of the triangles is 1+2+3+4+5+6+7=21, not 42.
@statsandscience Před 2 lety ⁺¹
@@cannot-handle-handles Yes, I deleted my comment because I noticed my mistake prior to your answer. It makes sense what you say, I did not do this test before. Do you have a good intuitive explanation for the 2?
@statsandscience Před 2 lety ⁺¹
Is it because you basically calculate the means of all pairs of points?
@MathAndComputers Před 3 lety ⁺⁵
Thanks for the explanations! I'd been meaning to learn about this for ages, but just hadn't gotten around to it, haha. 😅 Something that might be helpful is that if you put times and labels in a list in the description, CZcams will now automatically split up the play bar into chapters, as long as the first one is 0:00, so something like:
0:00 - Intro
1:33 - Terminology
4:28 - Estimating the mean or variance
9:20 - Why is the version with n biased?
17:03 - Why does n-1 save it? (explanation 1)
21:56 - Why does n-1 save it? (explanation 2)
28:42 - Summary
@statsandscience Před 3 lety ⁺²
That was super helpful, thanks! And extra thanks for providing all the correct time stamps!
@phil2888 Před 8 měsíci ⁺¹
This is great, I have grappled with this for quite a while.
@user-kh5ju1du4d Před 10 měsíci
Thank you SO much for this video. It has been so hard to find a proper explanation of this.
@DanTee2_718 Před rokem ⁺²
This was honestly such a great watch, thank you for the video
@statsandscience Před rokem ⁺¹
Thank you!
@danielheckel2755 Před 2 lety ⁺¹
Very enjoyable explanation. Thank you! Greetings from Mexico.
@mariuskornovan5520 Před 3 měsíci
Great video! Helped me finally understand the derivation of the sample standard deviation
@gokulkrishna2667 Před 2 lety ⁺¹
The greatest video on this aspect on the internet!
@statsandscience Před 2 lety
Thanks, glad you liked it!
@Sid-ge9vb Před rokem
this is an amazing explanation, Thank you so much ! I was so frustrated by the hand wavy explanations on youtube , even in lectures !
@statsandscience Před rokem ⁺¹
Thank you, I really appreciate it!
@milanradovanovic3693 Před rokem ⁺¹
This puzzeled me for a long time... Thanks for explanaition... P. S. Always thought it was spelling mistake in book(s)
@shpensive Před 2 lety
Fantastic, I've been wondering this for a long time..
@osaabd390 Před rokem ⁺¹
Thank you so much for this great video. I appreciate it. I have to give you feedback though on the quality of the sound. I found it sometimes difficult to hear well what you say. Two things I would suggest you do, as I think your understanding of these concepts and ability to communicate them visually need not go to waste. The two solutions I suggest are 1. a better microphone (Shure and Rode are the best and not that expensive) and 2. read slower pleeaassee. I had to stop multiple times and go back to understand fully what you say. If you think you need to keep your videos below a certain time threshold, then cut off unnecessary words from your script, using shorter words, trim wordy phrases (e.g. use 'most' instead of 'the majority of'). Thanks again for the great effort, keep it going.
@statsandscience Před rokem
Hey, thank you so much to take the time to give detailed feedback. 1) I am actually using such a microphone, but maybe it wasn't well positioned? I will check that. 2) thanks, I will try! It is not that I want to shorten videos, I am just used to talk fast I guess...
@osaabd390 Před rokem ⁺¹
@@statsandscience good luck with your work and thank you from the bottom of my heart, I really do understand why we divide by n-1 now :D .
@Number_Cruncher Před 3 lety ⁺²
Nice, now it is clear to me.
@statsandscience Před 3 lety
Great, thank you for the comment!
@lurkertech Před 10 měsíci ⁺¹
Thanks for the best video I've ever seen on the n-1. Referring to the key questions at 8:37 I was hoping to find the answer to a more specific question #2: not just "why isn't it n-2 or n-pi" but "why does the correction factor (n-1/n) not depend on the ratio between the sample size and the population size?" That is, if I know the population is 1000, and I choose a sample of 10 vs. a sample of 999, why wouldn't I use different correction factors to get the best answer? After all, my sample of 999 is going to be darn close to the true population variance whereas my sample of 10 is going to be way off. Your video kind of implies, but doesn't say directly (wish it did) that the n-1 "solution" provides the "average" correction factor you might need for any possible sample size relative to the population size, or to say the same thing in another way, the n-1 is the best you can do if you don't actually know the population size. Is that correct? If we DO know the population size exactly, then can we choose a better correction factor that is tailored to that particular sample size : population size ratio?
@lurkertech Před 10 měsíci ⁺¹
To make an even clearer statement of the problem...suppose my sample size is always 499. Now suppose that the actual population is either 500 or 1000. So that's 2 cases in total. According to the n-1 rule, I should apply the same correction (499/(499-1)) to 499 samples in a 500 population as I should apply for 499 samples in a 1000 population. That doesn't seem to be the best we can do if we know the actual population size, since I should not need to correct as hard when sample size is very close to population size. So is the n-1 rule designed only for the case where one does not know the population size? If we do know the population size, can we do better? Using what formula?
@statsandscience Před 9 měsíci ⁺²
Sorry that I did not come around earlier to answer this question. You put a lot of effort into this and I hope you still benefit from an answer!
When you take intro stats classes, a quite basic assumption that lurks basically everywhere is that the population you are dealing with is infinite. Of course, this assumption is also basically always wrong. Usually that does not matter though, as populations are usually "big enough", so that wrong estimates of the actual population size do not influence our outcomes to a degree we would care about. The same is true here in this formula: It is not the "average" correction factor for all possible samples, but the one for an infinite population - again, it usually does not matter what the actual size is, except when the sample size comes close to the population size.
Now you correctly identified that this can cause problems because in this case you actually know a lot more than what the formula is giving you credit for.
What people came up with for this case it the Finite Population Correction (FPC) - I would advise to just google it and look for yourself as the space here is quite limited (of course you can also ask follow-up questions about that here if you like!). However, in a nutshell this correction does what you pointed out - it prevents that you correct "to hard".
@lurkertech Před 8 měsíci
@@statsandscience Thank you, it is a very useful answer. I didn't know about that assumption, and so when your examples had a population size of 7, I was extra confused. Thanks for clearing it up. That makes it clearer why the correction should be greater when the sample size is smaller. Maybe mention that assumption in your video description to help others in the same boat as me?
@chris_7711 Před 7 měsíci
Herzlichen Dank! Sehr aufschlussreich!
@pramodabandaru3566 Před 4 měsíci ⁺¹
I did not get how (x(sample mean)-population mean) squared/varience of sample mean is equal to variance of population/n or sample size. Cd anyone pls explain? 20:00
@ckq Před rokem ⁺¹
Before I watch, basically the average cuts the variance by a factor of n, but when we find the difference between a sample value and the sample average, the average contains 1/nth of that sample so the calculated variance is shrunk by a factor of (1-1/n).
@statsandscience Před rokem
I am not sure if I understand correctly but I think there might be something to it framing the mean as containing 1/n parts of the information within the sample... Would you say you were right after watching?
@jkally123 Před 2 lety ⁺³
How did he get to the statement made on 20:00 - that var(sample mean) is equal to population variance divided by n?
@statsandscience Před 2 lety ⁺¹
I brushed over this a bit because it was not the focus here. Intuitively, it makes sense I think that the variance of the sample mean must be smaller than the population variance and that this depends on n because as I explained, there is no way to get the most extreme observed values as means, and the mean will always become "less extreme" in comparison the higher n is. However, I don't know an intuitive explanation for the exact formula, but the reasoning goes like this: You try to calculate the variance of the sample mean, that is, the sum of the observations divided by n, like so: Var(obs1+obs2+obs3.../n). You can rewrite this to Var((1/n)*obs1 + (1/n)*obs2 + (1/n)*obs3...). A linear combination like this has a variance equal to the sum of whatever the factor is squared (in this case 1/n^2) times the variance of the individual components: (1/n^2)*Var(obs1) + (1/n^2)*Var(obs2)... When you then assume identical variances for the observations, this equals (1/n^2*)n*Var(obs) which is Var(obs)/n. You can find that a bit nicer formatted also here: online.stat.psu.edu/stat414/lesson/24/24.4
Hope this helps, thank you for the comment!
@Titurel Před 7 měsíci
I was wondering too!
@ajaydalvi1378 Před 2 lety ⁺¹
Finally Understood !...
@faresmhaya Před 5 měsíci
The explanation for why we devide by 2n² in the second formula is not intuitive to me, despite it working on a small example I tested. I feel redundency in dividing by both 2 and n². If we have two instances of each distance measurement, okay we can divide by two, reducing the number of distances we're taking into consideration. But why would we then need to also divide by a second n if we reduced the number of distances we're taking into consideration from n² when we divided by 2?
@andrew.schaeffer4032 Před rokem ⁺¹
What kind of statistics exactly do I need to learn in order to follow along? This looks really interesting, but I don't fully understand how it all works. Thanks!
@statsandscience Před rokem
You will probably find the general concept in any applied statistics textbook. As I said it is a basic step from descriptive statistics where you only draw conclusions about a particular sample to inferential statistics where you use a sample to draw conclusions about a bigger population and that is basically what is always needed and taught in applied statistics. The issue is that those books tend to be shallow in that regard and other books with more detail might only be helpful with a serious understanding of the math behind it. Which is why I made the video to bridge between these two.
Let me know if that was what you had in mind!
@user-ws5sq8fm4k Před rokem ⁺¹
Thank you for this great video.
I hope you continue uploading more videos.
Do you have e a written text for this video?
As a non-native English speaker, I face some difficulties to follow your speaking. I need to repeat hearing of many parts of the video to catch the words.
@statsandscience Před rokem
Thank you! Yes, I do have that and I always wanted to make proper subtitles but just did not get to it yet. CZcams auto generates subtitles as you probably know but I don't really like them. I will try to look into that soon and let you know.
@user-ws5sq8fm4k Před rokem ⁺¹
@@statsandscience
Thank you for your reply. I will wait for this precious script.
@statsandscience Před rokem
@@user-ws5sq8fm4k English subtitles are up now! I hope you will find them helpful
@iwatchtvwithportal5367 Před 10 měsíci ⁺²
I always thought the n-1 was related to degree of freedom spent, but actually it isn't!
@statsandscience Před 9 měsíci ⁺¹
Well, it is, but you can sort of getting around that in an explanation like this one. If you are interested, feel free to watch my video on degrees of freedom. :)
@Hossein_am98 Před rokem ⁺¹
thanks for the video, really good way to explaine!
Frankly to me, it seemed you are reading from a written text, because your speaking was too constant(no stress on the words no up and downs no nothing) and that made it really difficult for me to understand what you're saying
@statsandscience Před rokem
Thanks! I will try to improve speaking next time!
@hugoharada5301 Před 2 lety
Loved the video. Thanks!!
@user-ws5sq8fm4k Před rokem ⁺¹
Does the explanation using pairwise differences apply in sampling without replacement where diagonal zeros don't occur?
@statsandscience Před rokem
They would still occur, wouldn't they? Because the margins of the table are identical either way, so there would be zeros on the diagonal.
Sampling without replacement is also a separate issue, as for instance discussed here: stats.stackexchange.com/questions/70124/unbiased-estimator-of-variance-for-samples-without-replacement
@user-ws5sq8fm4k Před rokem
@@statsandscience
Thank you for your reply. Zeros occur when we substract each data point from itself and this doesn't happen in case of sampling without replacement.
@michaelchareka1175 Před rokem ⁺¹
Please upload more videos. I’m begging
@statsandscience Před rokem
Thanks, glad you liked it!
@se0271 Před 2 lety
So instead of the sample lying somewhere much lower than the true population mean, what if it's lying much higher? Would it be correct to use n+1 instead of n-1 in order to deliberately make the sample variance smaller?
@statsandscience Před 2 lety
The main problem is that you don't know that. Remember that we do all this with samples because we do not have access to the population - and this is a problem that happens because of sampling, but not when you can use the population values.
Imagine a student who goes to the school cafeteria every day, and who knows that the staff tends to hand out portions that are too small most days. So they ask for something extra every day (and receive it). This will move the portion size to the optimum most days, but on days where the portion size was correct in the first place or even greater, the request will make it worse. However, this is still better because on most days the size is too small, so the average will be closer to the optimum. Does that help?
@se0271 Před 2 lety
@@statsandscience Thank you for your response. It definitely helps but I still have the question of how you would know that the data values from a sample are too small. You cannot infer that it's too large, but why can you infer that it's too small? Shouldn't it go both ways? Maybe naturally, samples tend to gravitate around smaller data values as with the portion size example you gave? If that's the case then it does actually make sense since you'd typically not want to exceed the normal portion size so you don't run out (and this idea of scarcity can be applied to any other examples).
@statsandscience Před 2 lety
@@se0271 you indeed don't know that for a particular value. It can be too big or too small. It is just more likely that it is smaller. I'm afraid that when I go into more details I would just repeat what I said in the video but when you have specific questions I would be happy to help!
@se0271 Před 2 lety
@@statsandscience I see, I appreciate the explanation- thank you!
@sayarsine6479 Před rokem
legendary
@statsandscience Před rokem
Thanks!
@KarthikNaga329 Před 3 lety
What software do you use to type math equations and animate them in videos? thanks!
@statsandscience Před 3 lety
It's honestly just powerpoint and I won't recommend it for standard use, I am sure there are better options out there...
@koramawin6134 Před 2 lety
Subscribed!
@user-ws5sq8fm4k Před rokem
If you permit me, I may put Arabic translation on your video. If you provide me by the English script, it will facilitate my work.
@statsandscience Před rokem ⁺¹
Yes, that sounds great! I think you should now be able to just download it after I have added the subtitles.
@SanatanYogii Před 2 lety ⁺¹
upload more videos
@funfair-bs7wf Před rokem
Great video, but would be even greater if you articulated a bit more 😉

Další v pořadí

Automatické přehrávání

Statistical degrees of freedom - What are they REALLY?